Optimal Control Problems

Basics

Suppose that we are given a discrete-time stochastic dynamics model of the form

\[x_{k+1} = f(x_k, u_k, w_k),\]

where $x_k \in \mathbb{R}^n$ is the state, $u_k \in \mathbb{R}^m$ is the control input, and $w_k \in \mathbb{R}^r$ is the stochastic noise variable drawn from some probability distribution on $\mathbb{R}^r$.

Subject to the dynamics constraint, we are interested in minimizing an objective function $J$ over some finite horizon $N$:

\[J(x_{0:N}, u_{0:N-1}) \triangleq \sum_{k=0}^{N - 1} c(k, x_k, u_k) + h(x_N),\]

where $c(k, x_k, u_k) \geq 0$ is the stage cost at time $k$ and $h(x_N) \geq 0$ is the terminal cost.

Although not explicitly written above, the actual value of $J$ is dependent on the history of stochastic noise $w_{0:N-1}$. This means that the objective $J$ itself is a random variable whose value cannot be determined without actually observing the outcome. Therefore, Stochastic Opimal Control seeks to minimize a statistic associated with the objective, such as the expectation $\mathbb{E}[J]$.

Supported Problem Types

With RATiLQR.jl, you can formulate various types of stochastic optimal control problems with different objective statistics and dynamics models.

1. Standard Problem with Gaussian Noise

2. Standard Problem with Arbitrary Noise

3. Risk-Sensitive Problem with Gaussian Noise

  • Objective: $\frac{1}{\theta}\log\left(\mathbb{E}[\exp(\theta J)]\right)$ where $\theta > 0$ denotes the risk-sensitivity parameter and is user-specified.
  • Dynamics: $f(x_k, u_k, w_k) = f(x_k, u_k) + w_k, ~ w_k \sim \mathcal{N}(0, W_k)$
  • Definition: FiniteHorizonRiskSensitiveOptimalControlProblem
  • Solvers: ILEQGSolver

4. Distributionally-Robust Problem

  • Objective: $\max_{p \in \Xi} \mathbb{E}_p [J]$ where $p$ denotes the true, potentially unknown distribution over the noise variables $w_{0:N-1}$ and $\Xi$ is the ambiguity set that encodes how uncertain we are about the true distribution. Note that $p$ may not be Gaussian. In our formulation, $\Xi$ is defined by a KL divergence bound between the true distribution $p(w_{0:N-1})$ and the Gaussian model $q(w_{0:N-1}) \triangleq \prod_{k = 0}^{N - 1} \mathcal{N}(0, W_k)$ as follows:

    \[\Xi \triangleq \{p: \mathbb{D}_\mathrm{KL}(p \Vert q) \leq d\},\]

    where $d > 0$ is a user-specified constant that defines an upper bound.
  • Dynamics: $f(x_k, u_k, w_k) = f(x_k, u_k) + w_k$ where $w_{0:N-1}$ can have an arbitrary distribution as long as it is within the ambiguity set defined by the Gaussian model.
  • Definition: FiniteHorizonRiskSensitiveOptimalControlProblem
  • Solvers: CrossEntropyBilevelOptimizationSolver, NelderMeadBilevelOptimizationSolver

Problem Definition APIs

RATiLQR.FiniteHorizonRiskSensitiveOptimalControlProblemType
FiniteHorizonRiskSensitiveOptimalControlProblem(f, c, h, W, N) <: OptimalControlProblem

A finite horizon, stochastic optimal control problem where the dynamics function is subject to additive Gaussian noise w ~ N(0, W).

Arguments

  • f(x, u, f_returns_jacobian=false) – deterministic dynamics function
    • x is a state vector and u is a control input vector.
    • The third positional argument f_returns_jacobian determines whether the user computes and returns the Jacobians, and should default to false. If true, the return value must be augmented with matrices A and B, where A = dx_next/dx and B = dx_next/du. Otherwise the return value is the (noiseless) next state x_next.
  • c(k, x, u) – stage cost function
    • k::Int >= 0 is a time index where k == 0 is the initial time.
    • We assume that c is non-negative.
  • h(x) – terminal cost function
    • We assume that h is non-negative.
  • W(k) – covariance matrix function
    • Returns a symmetric positive semidefinite matrix that represents the covariance matrix for additive Gaussian noise w ~ N(0, W).
    • k::Int >= 0 is a time index where k == 0 is the initial time.
  • N::Int64 – final time index
    • Note that 0 is the initial time index.

Notes

  • Functions f, c, and h should be written generically enough to accept the state x and the input u of type Vector{<:Real}. This is to ensure that ForwardDiff can compute Jacobians and Hessians for iLQG/iLEQG.

Example

import LinearAlgebra;

function f(x, u, f_returns_jacobian=false)
    x_next = x + u; # 2D single integrator dynamics
    if f_returns_jacobian
        A = Matrix(1.0LinearAlgebra.I, 2, 2); # dx_next/dx
        B = Matrix(1.0LinearAlgebra.I, 2, 2); # dx_next/du
        return x_next, A, B;
    else
        return x_next;
    end
end

c(k, x, u) = k/2*x'*x + k/2*u'*u  # time-dependent quadratic stage cost
N = 10;
h(x) = N/2*x'*x; # quadratic terminal cost
W(k) = Matrix(0.1LinearAlgebra.I, 2, 2);

problem = FiniteHorizonRiskSensitiveOptimalControlProblem(f, c, h, W, N);
source
RATiLQR.FiniteHorizonGenerativeOptimalControlProblemType
FiniteHorizonGenerativeOptimalControlProblem(f_stochastic, c, h, N) <: OptimalControlProblem

A finite horizon, stochastic optimal control problem where the dynamics function is stochastic and generative.

Arguments

  • f_stochastic(x, u, rng, use_true_model=false) – stochastic dynamics function

    • x is a state vector and u is a control input vector.
    • The third positional argument rng is a random seed.
    • The fourth positional argument use_true_model determines whether a solver has access to the true stochastic dynamics and defaults to false.
    • The return value is the (noisy) next state x_next.
  • c(k, x, u) – stage cost function

    • k::Int >= 0 is a time index where k == 0 is the initial time.
    • We assume that c is non-negative.
  • h(x) – terminal cost function

    • We assume that h is non-negative.
  • N::Int64 – final time index

    • Note that 0 is the initial time index.

Example

import Distributions;
import LinearAlgebra;

function f_stochastic(x, u, rng, use_true_model=false)
    Σ_1 = Matrix(0.5LinearAlgebra.I, 2, 2);

    if use_true_model  # accurate GMM model
        Σ_2 = Matrix(1.0LinearAlgebra.I, 2, 2)
        d = Distributions.MixtureModel([Distributions.MvNormal(zeros(2), Σ_1),
                                        Distributions.MvNormal(ones(2), Σ_2)],
                                        [0.5, 0.5]);
    else  # inaccurate Gaussian model
        d = Distributions.MvNormal(zeros(2), Σ_1);
    end

    x_next = x + u + Distributions.rand(rng, d); # 2D single integrator dynamics
    return x_next;
end

c(k, x, u) = k/2*x'*x + k/2*u'*u  # time-dependent quadratic stage cost
N = 10;
h(x) = N/2*x'*x; # quadratic terminal cost
W(k) = Matrix(0.1LinearAlgebra.I, 2, 2);

problem = FiniteHorizonGenerativeOptimalControlProblem(f_stochastic, c, h, N);
source