Optimal Control Problems
Basics
Suppose that we are given a discrete-time stochastic dynamics model of the form
\[x_{k+1} = f(x_k, u_k, w_k),\]
where $x_k \in \mathbb{R}^n$ is the state, $u_k \in \mathbb{R}^m$ is the control input, and $w_k \in \mathbb{R}^r$ is the stochastic noise variable drawn from some probability distribution on $\mathbb{R}^r$.
Subject to the dynamics constraint, we are interested in minimizing an objective function $J$ over some finite horizon $N$:
\[J(x_{0:N}, u_{0:N-1}) \triangleq \sum_{k=0}^{N - 1} c(k, x_k, u_k) + h(x_N),\]
where $c(k, x_k, u_k) \geq 0$ is the stage cost at time $k$ and $h(x_N) \geq 0$ is the terminal cost.
Although not explicitly written above, the actual value of $J$ is dependent on the history of stochastic noise $w_{0:N-1}$. This means that the objective $J$ itself is a random variable whose value cannot be determined without actually observing the outcome. Therefore, Stochastic Opimal Control seeks to minimize a statistic associated with the objective, such as the expectation $\mathbb{E}[J]$.
Supported Problem Types
With RATiLQR.jl, you can formulate various types of stochastic optimal control problems with different objective statistics and dynamics models.
1. Standard Problem with Gaussian Noise
- Objective: $\mathbb{E}[J]$
- Dynamics: $f(x_k, u_k, w_k) = f(x_k, u_k) + w_k, ~ w_k \sim \mathcal{N}(0, W_k)$
- Definition:
FiniteHorizonRiskSensitiveOptimalControlProblem - Solvers:
ILEQGSolver
2. Standard Problem with Arbitrary Noise
- Objective: $\mathbb{E}[J]$
- Dynamics: $f(x_k, u_k, w_k)$ where $w_k$ has an arbitrary distribution.
- Definition:
FiniteHorizonGenerativeOptimalControlProblem - Solvers:
CrossEntropyDirectOptimizationSolver
3. Risk-Sensitive Problem with Gaussian Noise
- Objective: $\frac{1}{\theta}\log\left(\mathbb{E}[\exp(\theta J)]\right)$ where $\theta > 0$ denotes the risk-sensitivity parameter and is user-specified.
- Dynamics: $f(x_k, u_k, w_k) = f(x_k, u_k) + w_k, ~ w_k \sim \mathcal{N}(0, W_k)$
- Definition:
FiniteHorizonRiskSensitiveOptimalControlProblem - Solvers:
ILEQGSolver
4. Distributionally-Robust Problem
- Objective: $\max_{p \in \Xi} \mathbb{E}_p [J]$ where $p$ denotes the true, potentially unknown distribution over the noise variables $w_{0:N-1}$ and $\Xi$ is the ambiguity set that encodes how uncertain we are about the true distribution. Note that $p$ may not be Gaussian. In our formulation, $\Xi$ is defined by a KL divergence bound between the true distribution $p(w_{0:N-1})$ and the Gaussian model $q(w_{0:N-1}) \triangleq \prod_{k = 0}^{N - 1} \mathcal{N}(0, W_k)$ as follows:
\[\Xi \triangleq \{p: \mathbb{D}_\mathrm{KL}(p \Vert q) \leq d\},\]
where $d > 0$ is a user-specified constant that defines an upper bound. - Dynamics: $f(x_k, u_k, w_k) = f(x_k, u_k) + w_k$ where $w_{0:N-1}$ can have an arbitrary distribution as long as it is within the ambiguity set defined by the Gaussian model.
- Definition:
FiniteHorizonRiskSensitiveOptimalControlProblem - Solvers:
CrossEntropyBilevelOptimizationSolver,NelderMeadBilevelOptimizationSolver
Problem Definition APIs
RATiLQR.OptimalControlProblem — TypeOptimalControlProblemAbstract base type for an Optimal Control Problem.
RATiLQR.FiniteHorizonRiskSensitiveOptimalControlProblem — TypeFiniteHorizonRiskSensitiveOptimalControlProblem(f, c, h, W, N) <: OptimalControlProblemA finite horizon, stochastic optimal control problem where the dynamics function is subject to additive Gaussian noise w ~ N(0, W).
Arguments
f(x, u, f_returns_jacobian=false)– deterministic dynamics functionxis a state vector anduis a control input vector.- The third positional argument
f_returns_jacobiandetermines whether the user computes and returns the Jacobians, and should default tofalse. Iftrue, the return value must be augmented with matricesAandB, whereA = dx_next/dxandB = dx_next/du. Otherwise the return value is the (noiseless) next statex_next.
c(k, x, u)– stage cost functionk::Int >= 0is a time index wherek == 0is the initial time.- We assume that
cis non-negative.
h(x)– terminal cost function- We assume that
his non-negative.
- We assume that
W(k)– covariance matrix function- Returns a symmetric positive semidefinite matrix that represents the covariance matrix for additive Gaussian noise w ~ N(0, W).
k::Int >= 0is a time index wherek == 0is the initial time.
N::Int64– final time index- Note that
0is the initial time index.
- Note that
Notes
- Functions
f,c, andhshould be written generically enough to accept the statexand the inputuof typeVector{<:Real}. This is to ensure that ForwardDiff can compute Jacobians and Hessians for iLQG/iLEQG.
Example
import LinearAlgebra;
function f(x, u, f_returns_jacobian=false)
x_next = x + u; # 2D single integrator dynamics
if f_returns_jacobian
A = Matrix(1.0LinearAlgebra.I, 2, 2); # dx_next/dx
B = Matrix(1.0LinearAlgebra.I, 2, 2); # dx_next/du
return x_next, A, B;
else
return x_next;
end
end
c(k, x, u) = k/2*x'*x + k/2*u'*u # time-dependent quadratic stage cost
N = 10;
h(x) = N/2*x'*x; # quadratic terminal cost
W(k) = Matrix(0.1LinearAlgebra.I, 2, 2);
problem = FiniteHorizonRiskSensitiveOptimalControlProblem(f, c, h, W, N);RATiLQR.FiniteHorizonGenerativeOptimalControlProblem — TypeFiniteHorizonGenerativeOptimalControlProblem(f_stochastic, c, h, N) <: OptimalControlProblemA finite horizon, stochastic optimal control problem where the dynamics function is stochastic and generative.
Arguments
f_stochastic(x, u, rng, use_true_model=false)– stochastic dynamics functionxis a state vector anduis a control input vector.- The third positional argument
rngis a random seed. - The fourth positional argument
use_true_modeldetermines whether a solver has access to the true stochastic dynamics and defaults tofalse. - The return value is the (noisy) next state
x_next.
c(k, x, u)– stage cost functionk::Int >= 0is a time index wherek == 0is the initial time.- We assume that
cis non-negative.
h(x)– terminal cost function- We assume that
his non-negative.
- We assume that
N::Int64– final time index- Note that
0is the initial time index.
- Note that
Example
import Distributions;
import LinearAlgebra;
function f_stochastic(x, u, rng, use_true_model=false)
Σ_1 = Matrix(0.5LinearAlgebra.I, 2, 2);
if use_true_model # accurate GMM model
Σ_2 = Matrix(1.0LinearAlgebra.I, 2, 2)
d = Distributions.MixtureModel([Distributions.MvNormal(zeros(2), Σ_1),
Distributions.MvNormal(ones(2), Σ_2)],
[0.5, 0.5]);
else # inaccurate Gaussian model
d = Distributions.MvNormal(zeros(2), Σ_1);
end
x_next = x + u + Distributions.rand(rng, d); # 2D single integrator dynamics
return x_next;
end
c(k, x, u) = k/2*x'*x + k/2*u'*u # time-dependent quadratic stage cost
N = 10;
h(x) = N/2*x'*x; # quadratic terminal cost
W(k) = Matrix(0.1LinearAlgebra.I, 2, 2);
problem = FiniteHorizonGenerativeOptimalControlProblem(f_stochastic, c, h, N);