Linear Quadratic Regulator (LQR)

LQR

State-feedback control via pole placement requires one to assign the closed-loop poles, LQR is a technique to place automatically and optimally the closed-loop poles

##Problem description

  • consider system $x(k+1)=Ax(k)+Bu(k)$ with initial condition $x(0)$, we look for the optimal sequence of inputs $U={u(0),\ u(1),\cdots,\ u(N-1) }$ driving the state $x(k)$ towards the prigin and that minimizes the performance index

    ​ $J(x(0),U)=x’(N)Q_Nx(N)+\sum\limits_{k=0}^{N-1} x’(k)Qx(k) + u’(k)Ru(k)$ quadratic cost

    where $Q=Q’ \succeq 0, R=R’ \succeq 0, Q_N=Q’_N \succeq 0$, means positive definite matrix.

  • consider the controllability of the state to zero with minimum energy input

    ​ $\min_U \begin{Vmatrix} \begin{bmatrix} u(0)\ u(1)\ \vdots \ u(N-1) \end{bmatrix} \end{Vmatrix}$

    s.t. $x(N)=0$

  • The minimum-energy control problem can be seen as aparticular case of the LQ optimal control problem by setting $R=I, \ Q=0, \ Q_N=\infin \cdot I $

  • By substituting $x(k)=A^k x(0) + \sum\limits_{i=0}^{k-1} A^iBu(k-1-i)$ in

    ​ $J(x(0),U)=\sum\limits_{k=0}^{N-1} x’(k)Qx(k) + u’(k)Ru(k) +x’(N)Qx(N)$

    we obtain $J(x(0),U)=\frac{1}{2}U’HU+x(0)’FU+\frac{1}{2}x’(0)Yx(0)$ , where $H=H’ \succeq 0$ is a positive definite matrix

##Solution1:

  • The optimizer $U^*$ is obtained by zeroing the gradient

    ​ $0=\nabla_U J(x(0),U)=HU+F’x(0)$

    ​ $\rightarrow U^* = \begin{bmatrix} u^*(0)\ u^*(1)\ \vdots\ u^*(N-1) \end{bmatrix} =-H^{-1}F’x(0)$

    this $U^*$ is an open-loop one: $u(k)=f_k(x(0)), k=0,1,\cdots,N-1$

    • Moreover the dimensions of the $H$ and $F$ matrices is proportional to the time horizon N

##Solution2:

  • using dynamic programming [Bellman’s principle of optimality]

  • at a generic instant $k_1$ and state $x(k_1)=z$ consider the optimal cost-to-go

    ​ $V_{k_1}(z)=\min\limits_{u(k_1),\cdots,u(N-1)}\left{ \sum\limits_{k=k_1}^{N-1}x’(k)Qx(k)+u’(k)Ru(k)+x’(N)Q_Nx(N) \right}$

    and

    ​ $\begin{align} V_{0}(x(0))&=\min\limits_{U\triangleq u(0),\cdots,u(N-1)} J(x(0),U) \ &=\min\limits_{u(0),\cdots,u(k_1-1)}\left{ \sum\limits_{k=0}^{k_1-1}x’(k)Qx(k)+u’(k)Ru(k)+V_{k_1}(x(k_1)))\right} \end{align}$ $\rightarrow$ iteration

  • starting at $x(0)$, the minimum cost over $[0,N]$ equals the minumum cost spent until step $k_1$ plus the optimal cost-to-go from $k_1$ to $N$ starting at $x(k_1)$


    Bellman’s optimality principle also applies to nonlinear system and/or non-quadratic cost functions:

    • The optimal control law can be always written in state-feedback form

    ​ $u^*(k)=f_k(x^*(k)), \ \forall k=0,\cdots,N-1$

    • Compared to the open-loop solution ${u^*(0),\cdots, u^*(N − 1)} = f(x(0))$ the feedback form has the big advantage of being more robust with respect to perturbations: at each time $k$ we apply the best move on theremaining period $[k, N]$

Riccati iterations

Finite-horizon optimal control

compute the optimal inputs $u^*(k)$ recursively as a function of $x^*(k)$ (Riccati iterations):

  1. Initialisation: $P(N)=Q_N$
  2. for $k=N,\cdots,1$ , compute recursively the following matrix

​ $P(K-1)=Q-A’P(k)B(R+B’P(k)B)^{-1}B’P(k)A+A’P(k)A$

  1. Define

​ $K(k)=-(R+B’P(k+1)B)^{-1}B’P(k+1)A$

​ The optimal input is a (linear time-varying) state feedback

​ $u^*(k)=K(k)x^*(k)$

Infinite-horizon optimal control

regulate $x(k)$

​ $V^{\infin}(x(0)) =\min\limits_{u(0),u(1),\cdots} \sum\limits_{k=0}^{\infin}x’(k)Qx(k)+u’(k)Ru(k)$

  • Results:

    exists a unique solution $P_∞$ of the algebraic Riccati equation (ARE):

    ​ $P_∞ = A′P_∞A + Q − A′P_∞B(B′P_∞B + R)^{−1}B′P_∞A$
    such that the optimal cost is $V_∞(x(0)) = x′(0)P_∞x(0)$ and the optimal control law is the constant linear state feedback $u(k) = K_{LQR} x(k)$ with

    ​ $K_{LQR} = −(R + B′P_∞B)^{−1} B′P_∞A$

1
2
3
4
5
6
P_∞ = dare(A,B,Q,R) % Solve discrete-time algebraic Riccati equations.

[-K_∞,P_∞]=dlqr(A,B,Q,R) % Linear-quadratic regulator design for discrete-time systems.

[-K_∞,P_∞,E] = lqr(sysd,Q,R) % Linear-quadratic regulator design for state space systems.
where E= closed-loop poles = eigenvalues of (A + B K_{LQR})
  • Closed-loop stability is ensured if (A,B) is stabilizable, $R≻0$, $Q⪰0$ , and $(A, Q^{\frac{1}{2}})$ is detectable, where $Q^{\frac{1}{2}}$ is the Cholesky factor of $Q$.

    In fact, Given a matrix $Q=Q′⪰0$ , its Cholesky factor is an upper-triangular matrix $C$ such that $C′C = Q$ (MATLAB:chol)

  • LQR is an automatic and optimal way of placing poles!

regulate $y(k)$

if we just want regulate $y(k)=Cx(k)$, rather than $x(k)$ , to zero, so

 $V^{\infin}(x(0))=\min\limits_{u(0),u(1),\cdots} \sum\limits_{k=0}^{\infin}y'(k)Qy(k)+u'(k)Ru(k)$

so, similarly, this problem is again an LQR problem with equivalent state weight $Q=C’Q_y C$

MATLAB: [-K_∞,P_∞,E] = dlqry(sysd,Qy,R)

Corollary:

Let $(A, B)$ stabilizable, $(A, C)$ detectable, $R > 0$, $Q_y > $0. The LQR control law $u(k) = K_{LQR} x(k)$ the asymptotically stabilizes the closed-loop system

​ $\lim\limits_{t \rightarrow \infin}x(t)=0$, $\lim\limits_{t \rightarrow \infin} u(t)=0$

  • Idea:

    Intuitively: the minimum cost $x′(0)P_∞x(0)$ is finite ⇒ $y(k) → 0$ and $u(k) → 0$.
    $y(k) → 0$ implies that the observable part of the state $→ 0$. As $u(k) → 0$, the unobservable states
    remain undriven and go to zero spontaneously (=detectability condition)

Kalman filtering

Assign observer poles in an optimal way, that is to minimize the state estimation error $\tilde{x} = x − \hat{x}$

##Model:

  • linear time-varying system with noise

    ​ $\begin{align} x(k+1)&=A(k)x(k)+B(k)u(k)+G(k)\xi(k) \ y(k)&=C(k)x(k)+D(k)u(k)+\zeta(k)\ x(0)&=x_0 \end{align}$

    • $\xi(k)$ Process noise, assume $\mathbb{E}(\xi(k))=0$ [zero means]; $\mathbb{E}(\xi(k) \xi’(j))=0$ [white noise]; and $\mathbb{E}(\xi(k) \xi’(k))=Q(k)$ [covariance matrix]

    • $\zeta(k)$ measurement noise; assume $\mathbb{E}(\zeta(k))=0$ [zero means]; $\mathbb{E}(\zeta(k) \zeta’(j))=0$ [white noise]; and $\mathbb{E}(\zeta(k) \zeta’(k))=R(k)$ [covariance matrix]

    • $x_0$ random vector; $\mathbb{E}(x_0)=\bar{x}_0$ ; $\mathbb{E}[(x_0-\bar{x}_0)(x_0-\bar{x}_0)’]=Var[x_0]=P_0$

    • Vectors $\xi(k)$ , $\zeta(k)$ , $x_0$ are uncorrelated: $\mathbb{E}(\xi(k) \zeta’(j))=0$ ; and $\mathbb{E}(\xi(k) x’_0)=0$; $\mathbb{E}(\zeta(k) x’_0)=0$

    • Probability distributions: assume normal (Gaussian) distributions

      ​ $\xi(k)\sim\mathcal{N}(0,Q(k))$ , $\xi(k) \sim \mathcal{N}(0,R(k))$ , $x_0 \sim \mathcal(\bar{x}_0,P_0)$

      Annotations

##Steps:

The Kalman filter provides the optimal estimate $\hat{x}(k|k)$ of $x(k)$ given the measurements up to time $k$. Optimality means that the trace of the variance $P(k+1|k)$ is minimized.

Step1: measurement update

  • Measurement update based on the most recent $y(k)$

    ​ $M(k)=P(k|k-1) C(k)’ [ C(k) P(k|k-1) C(k)’+ R(k) ]^{-1}$

    ​ $\hat{x}(k|k)=\hat{x}(k|k-1)+M(k) [ y(k)- C(k) \hat{x}(k|k-1) -D(k)u(k) ]$

    ​ $P(k|k)=( I - M(k) C(k)) P(k|k-1)$

    With initial conditions $\hat{x}(0|-1)=\hat{x}_0, P(0|-1)=P_0$

Step 2: Time update

  • time update based on the model of the system

    ​ $\hat{x}(k+1|k)=A\hat{x}(k|k)+Bu(k)$

    ​ $P(k+1|k)=A(k)P(k|k)A(k)’ + G(k)Q(k)G(k)’$

Stationary Kalman Filter

  • assume $A,C,G,Q,R$ are constant

  • $P(k|k-1), M(k)$ converge to the constant matrices

    ​ $P_{\infin} = AP_{\infin}A’ + GQG’ -AP_{\infin}C’[CP_{\infin}C’+R]^{-1} CP_{\infin}A’$

    ​ $\ M\ =P_{\infin} C’(CP_{\infin}C’+R)^{-1}$

  • By setting $L=AM$, the dynamics of the prediction $\hat{x}(k|k-1)$ becomes the luenberger observer

    ​ $\hat{x}(k+1|k)=A\hat{x}(k|k-1)+B(k)u(k)+L(y(k)-C\hat{x}(k|k-1)-D(k)u(k))$

    With all the eigenvalues of $(A-LC)$ inside the unit circle

    MATLAB:

    [~,L,P_∞,M,Z]=kalman(sys,Q,R),

    ​ where $Z=\mathbb{E}[\tilde{x}(k|k)\tilde{x}(k|k)’]$

Tuning Kalman Filters

  • It is usually hard to quantify exactly the correct values of $Q$ and $R$ for a given process
  • The diagonal terms of $R$ are related to how noisy are output sensors
  • $Q$ is harder to relate to physical noise, it mainly relates to how rough is the $(A, B)$ model
  • After all, $Q$ and $R$ are the tuning knobs of the observer (similar to LQR)
  • The “larger” is $R$ with respect to $Q$, the “slower” is the observer to converge ($L$, $M$ will be small)
  • On the contrary, the “smaller” is $R$ than $Q$, the more precise are considered the measurments, and the “faster” observer will be to converge

Extended Kalman Filter

extended to nonlinear system:

Model:

nonlinear model:

​ $\begin{align} x(k+1)&=f(x(k),u(k),\xi(k)) \ y(k)&=g(x(k),u(k))+\zeta(k) \end{align}$

Steps:

  1. Measurement update:

    ​ $$\begin{align} C(k)&=\frac{\partial{g}}{\partial{x}} (\hat{x}_{k|k-1},u(k)) \ M(k)&=P(k|k-1)C(k)’[C(k)R(k|k-1)C(k)’+R(k)]^{-1}\ \hat{x}(k|k)&=\hat{x}(k|k-1)+M(k)(y(k)-g(\hat{x}(k|k-1),u(k))) \ P(k|k)&=(I-M(k)C(k))P(k|k-1) \end{align}$$

  2. Time update:

    ​ $$\begin{align} \hat{x}(k+1|k)&=f(\hat{x}(k|k),u(k)),& \ \ \ \hat{x}(0|-1)=\hat{x}0 \ A(k)&=\frac{\partial f}{\partial x}(\hat{x}{k|k},u{k},\mathbb{E}[\xi(k)]), &\ \ G(k)=\frac{\partial f}{\partial \xi}(\hat{x}_{k|k},u(k),\mathbb{E}[\xi(k)]) \ P(k+1|k)&=A(k)P(k|k)A(k)’ + G(k)Q(k)G(k)’, &\ \ P(0|-1)=P_0 \end{align}$$

  • The EKF is in general not optimal and may even diverge, due to linearisation. But is the de-facto standard in nonlinear state estimation

LQG

LQG (Linear Quadratic Gaussian) control: LQR control + stationary Kalman predictor/filter

Model:

  • stochastic dynamic system

​ $\begin{align} x(k+1)&=Ax(k)+Bu(k)+\xi(k),& \ \ w\sim\mathcal{N}(0,Q_{KF}) \ y(k)&=Cx(k)+\zeta(k),& \ \ v\sim\mathcal{N}(0,R_KF) \end{align}$

​ with initial condition $x(0)=x_0, x_0\sim\mathcal{N}(\bar{x}0,P_0), P,Q{KF}\succeq0, R_{KF}>0$ , and $\xi$ and $\zeta$ are independent and white noise terms

  • objective is to minimize the cost function:

​ $J(x(0),U)=\lim\limits_{T\rightarrow \infin} \frac{1}{T} \mathbb{E} \left[ \sum\limits_{k=0}^{T} x’(k)Q_{LQ}x(k)+u’()k R_{LQ}U(K)\right]$

​ when the state $x$ is not measurable

control

  • the pair $(A,B)$ is reachable and the pair $(A,C_q)$ with $C_q$ such that $Q_{LQ} =C_qC_q′$ is observable (here $Q$ is the weight matrix of the LQ controller)
  • the pair $(A, B_q)$, with $B_q$ s.t. $Q_{KF} = B_qB_q′$ , is stabilizable, and the pair $(A, C)$ is observable (here $Q$ is the covariance matrix of the Kalman predictor/filter)

Then,

  1. Determine the optimal stationary Kalman predictor/filter, neglecting the fact that the control variable $u$ is generated through a closed-loop control scheme, and find the optimal gain $L_KF$
  2. Determine the optimal $L_{QR}$ strategy assuming the state accessible, and find the optimal gain $K_{LQR}$

LQG controller

Analogously to the case of output feedback control using a Luenberger observer, it is possible to show that the extended state $[x′ \ \ \tilde{x}’]’$ has eigenvalues equal to the eigenvalues of $(A + BK_{LQR})$ plus those of $(A − L_{KF} C)$ (2n in total)