Optimal Control Problems

Basics

Suppose that we are given a discrete-time stochastic dynamics model of the form

\[x_{k+1} = f(x_k, u_k, w_k),\]

where $x_k \in \mathbb{R}^n$ is the state, $u_k \in \mathbb{R}^m$ is the control input, and $w_k \in \mathbb{R}^r$ is the stochastic noise variable drawn from some probability distribution on $\mathbb{R}^r$.

Subject to the dynamics constraint, we are interested in minimizing an objective function $J$ over some finite horizon $N$:

\[J(x_{0:N}, u_{0:N-1}) \triangleq \sum_{k=0}^{N - 1} c(k, x_k, u_k) + h(x_N),\]

where $c(k, x_k, u_k) \geq 0$ is the stage cost at time $k$ and $h(x_N) \geq 0$ is the terminal cost.

Although not explicitly written above, the actual value of $J$ is dependent on the history of stochastic noise $w_{0:N-1}$. This means that the objective $J$ itself is a random variable whose value cannot be determined without actually observing the outcome. Therefore, Stochastic Opimal Control seeks to minimize a statistic associated with the objective, such as the expectation $\mathbb{E}[J]$.

Supported Problem Types

With RATiLQR.jl, you can formulate various types of stochastic optimal control problems with different objective statistics and dynamics models.

1. Standard Problem with Gaussian Noise

Objective: $\mathbb{E}[J]$
Dynamics: $f(x_k, u_k, w_k) = f(x_k, u_k) + w_k, ~ w_k \sim \mathcal{N}(0, W_k)$
Definition: FiniteHorizonRiskSensitiveOptimalControlProblem
Solvers: ILEQGSolver

2. Standard Problem with Arbitrary Noise

Objective: $\mathbb{E}[J]$
Dynamics: $f(x_k, u_k, w_k)$ where $w_k$ has an arbitrary distribution.
Definition: FiniteHorizonGenerativeOptimalControlProblem
Solvers: CrossEntropyDirectOptimizationSolver

3. Risk-Sensitive Problem with Gaussian Noise

Objective: $\frac{1}{\theta}\log\left(\mathbb{E}[\exp(\theta J)]\right)$ where $\theta > 0$ denotes the risk-sensitivity parameter and is user-specified.
Dynamics: $f(x_k, u_k, w_k) = f(x_k, u_k) + w_k, ~ w_k \sim \mathcal{N}(0, W_k)$
Definition: FiniteHorizonRiskSensitiveOptimalControlProblem
Solvers: ILEQGSolver

4. Distributionally-Robust Problem

Objective: $\max_{p \in \Xi} \mathbb{E}_p [J]$ where $p$ denotes the true, potentially unknown distribution over the noise variables $w_{0:N-1}$ and $\Xi$ is the ambiguity set that encodes how uncertain we are about the true distribution. Note that $p$ may not be Gaussian. In our formulation, $\Xi$ is defined by a KL divergence bound between the true distribution $p(w_{0:N-1})$ and the Gaussian model $q(w_{0:N-1}) \triangleq \prod_{k = 0}^{N - 1} \mathcal{N}(0, W_k)$ as follows:
\[\Xi \triangleq \{p: \mathbb{D}_\mathrm{KL}(p \Vert q) \leq d\},\]
where $d > 0$ is a user-specified constant that defines an upper bound.
Dynamics: $f(x_k, u_k, w_k) = f(x_k, u_k) + w_k$ where $w_{0:N-1}$ can have an arbitrary distribution as long as it is within the ambiguity set defined by the Gaussian model.
Definition: FiniteHorizonRiskSensitiveOptimalControlProblem
Solvers: CrossEntropyBilevelOptimizationSolver, NelderMeadBilevelOptimizationSolver

Problem Definition APIs

RATiLQR.OptimalControlProblem — Type

OptimalControlProblem

Abstract base type for an Optimal Control Problem.

source

RATiLQR.FiniteHorizonRiskSensitiveOptimalControlProblem — Type

FiniteHorizonRiskSensitiveOptimalControlProblem(f, c, h, W, N) <: OptimalControlProblem

A finite horizon, stochastic optimal control problem where the dynamics function is subject to additive Gaussian noise w ~ N(0, W).

Arguments

f(x, u, f_returns_jacobian=false) – deterministic dynamics function
- x is a state vector and u is a control input vector.
- The third positional argument f_returns_jacobian determines whether the user computes and returns the Jacobians, and should default to false. If true, the return value must be augmented with matrices A and B, where A = dx_next/dx and B = dx_next/du. Otherwise the return value is the (noiseless) next state x_next.
c(k, x, u) – stage cost function
- k::Int >= 0 is a time index where k == 0 is the initial time.
- We assume that c is non-negative.
h(x) – terminal cost function
- We assume that h is non-negative.
W(k) – covariance matrix function
- Returns a symmetric positive semidefinite matrix that represents the covariance matrix for additive Gaussian noise w ~ N(0, W).
- k::Int >= 0 is a time index where k == 0 is the initial time.
N::Int64 – final time index
- Note that 0 is the initial time index.

Notes

Functions f, c, and h should be written generically enough to accept the state x and the input u of type Vector{<:Real}. This is to ensure that ForwardDiff can compute Jacobians and Hessians for iLQG/iLEQG.

Example

import LinearAlgebra;

function f(x, u, f_returns_jacobian=false)
    x_next = x + u; # 2D single integrator dynamics
    if f_returns_jacobian
        A = Matrix(1.0LinearAlgebra.I, 2, 2); # dx_next/dx
        B = Matrix(1.0LinearAlgebra.I, 2, 2); # dx_next/du
        return x_next, A, B;
    else
        return x_next;
    end
end

c(k, x, u) = k/2*x'*x + k/2*u'*u  # time-dependent quadratic stage cost
N = 10;
h(x) = N/2*x'*x; # quadratic terminal cost
W(k) = Matrix(0.1LinearAlgebra.I, 2, 2);

problem = FiniteHorizonRiskSensitiveOptimalControlProblem(f, c, h, W, N);

source

RATiLQR.FiniteHorizonGenerativeOptimalControlProblem — Type

FiniteHorizonGenerativeOptimalControlProblem(f_stochastic, c, h, N) <: OptimalControlProblem

A finite horizon, stochastic optimal control problem where the dynamics function is stochastic and generative.

Arguments

f_stochastic(x, u, rng, use_true_model=false) – stochastic dynamics function
- x is a state vector and u is a control input vector.
- The third positional argument rng is a random seed.
- The fourth positional argument use_true_model determines whether a solver has access to the true stochastic dynamics and defaults to false.
- The return value is the (noisy) next state x_next.
c(k, x, u) – stage cost function
- k::Int >= 0 is a time index where k == 0 is the initial time.
- We assume that c is non-negative.
h(x) – terminal cost function
- We assume that h is non-negative.
N::Int64 – final time index
- Note that 0 is the initial time index.

Example

import Distributions;
import LinearAlgebra;

function f_stochastic(x, u, rng, use_true_model=false)
    Σ_1 = Matrix(0.5LinearAlgebra.I, 2, 2);

    if use_true_model  # accurate GMM model
        Σ_2 = Matrix(1.0LinearAlgebra.I, 2, 2)
        d = Distributions.MixtureModel([Distributions.MvNormal(zeros(2), Σ_1),
                                        Distributions.MvNormal(ones(2), Σ_2)],
                                        [0.5, 0.5]);
    else  # inaccurate Gaussian model
        d = Distributions.MvNormal(zeros(2), Σ_1);
    end

    x_next = x + u + Distributions.rand(rng, d); # 2D single integrator dynamics
    return x_next;
end

c(k, x, u) = k/2*x'*x + k/2*u'*u  # time-dependent quadratic stage cost
N = 10;
h(x) = N/2*x'*x; # quadratic terminal cost
W(k) = Matrix(0.1LinearAlgebra.I, 2, 2);

problem = FiniteHorizonGenerativeOptimalControlProblem(f_stochastic, c, h, N);

source