Linear Quadratic Regulator (LQR)
LQR
State-feedback control via pole placement requires one to assign the closed-loop poles, LQR is a technique to place automatically and optimally the closed-loop poles
##Problem description
consider system $x(k+1)=Ax(k)+Bu(k)$ with initial condition $x(0)$, we look for the optimal sequence of inputs $U={u(0),\ u(1),\cdots,\ u(N-1) }$ driving the state $x(k)$ towards the prigin and that minimizes the performance index
$J(x(0),U)=x’(N)Q_Nx(N)+\sum\limits_{k=0}^{N-1} x’(k)Qx(k) + u’(k)Ru(k)$ quadratic cost
where $Q=Q’ \succeq 0, R=R’ \succeq 0, Q_N=Q’_N \succeq 0$, means positive definite matrix.
consider the controllability of the state to zero with minimum energy input
$\min_U \begin{Vmatrix} \begin{bmatrix} u(0)\ u(1)\ \vdots \ u(N-1) \end{bmatrix} \end{Vmatrix}$
s.t. $x(N)=0$
The minimum-energy control problem can be seen as aparticular case of the LQ optimal control problem by setting $R=I, \ Q=0, \ Q_N=\infin \cdot I $
By substituting $x(k)=A^k x(0) + \sum\limits_{i=0}^{k-1} A^iBu(k-1-i)$ in
$J(x(0),U)=\sum\limits_{k=0}^{N-1} x’(k)Qx(k) + u’(k)Ru(k) +x’(N)Qx(N)$
we obtain $J(x(0),U)=\frac{1}{2}U’HU+x(0)’FU+\frac{1}{2}x’(0)Yx(0)$ , where $H=H’ \succeq 0$ is a positive definite matrix
##Solution1:
The optimizer $U^*$ is obtained by zeroing the gradient
$0=\nabla_U J(x(0),U)=HU+F’x(0)$
$\rightarrow U^* = \begin{bmatrix} u^*(0)\ u^*(1)\ \vdots\ u^*(N-1) \end{bmatrix} =-H^{-1}F’x(0)$
this $U^*$ is an open-loop one: $u(k)=f_k(x(0)), k=0,1,\cdots,N-1$
- Moreover the dimensions of the $H$ and $F$ matrices is proportional to the time horizon N
##Solution2:
using dynamic programming [Bellman’s principle of optimality]
at a generic instant $k_1$ and state $x(k_1)=z$ consider the optimal cost-to-go
$V_{k_1}(z)=\min\limits_{u(k_1),\cdots,u(N-1)}\left{ \sum\limits_{k=k_1}^{N-1}x’(k)Qx(k)+u’(k)Ru(k)+x’(N)Q_Nx(N) \right}$
and
$\begin{align} V_{0}(x(0))&=\min\limits_{U\triangleq u(0),\cdots,u(N-1)} J(x(0),U) \ &=\min\limits_{u(0),\cdots,u(k_1-1)}\left{ \sum\limits_{k=0}^{k_1-1}x’(k)Qx(k)+u’(k)Ru(k)+V_{k_1}(x(k_1)))\right} \end{align}$ $\rightarrow$ iteration
starting at $x(0)$, the minimum cost over $[0,N]$ equals the minumum cost spent until step $k_1$ plus the optimal cost-to-go from $k_1$ to $N$ starting at $x(k_1)$
Bellman’s optimality principle also applies to nonlinear system and/or non-quadratic cost functions:
- The optimal control law can be always written in state-feedback form
$u^*(k)=f_k(x^*(k)), \ \forall k=0,\cdots,N-1$
- Compared to the open-loop solution ${u^*(0),\cdots, u^*(N − 1)} = f(x(0))$ the feedback form has the big advantage of being more robust with respect to perturbations: at each time $k$ we apply the best move on theremaining period $[k, N]$
Riccati iterations
Finite-horizon optimal control
compute the optimal inputs $u^*(k)$ recursively as a function of $x^*(k)$ (Riccati iterations):
- Initialisation: $P(N)=Q_N$
- for $k=N,\cdots,1$ , compute recursively the following matrix
$P(K-1)=Q-A’P(k)B(R+B’P(k)B)^{-1}B’P(k)A+A’P(k)A$
- Define
$K(k)=-(R+B’P(k+1)B)^{-1}B’P(k+1)A$
The optimal input is a (linear time-varying) state feedback
$u^*(k)=K(k)x^*(k)$
Infinite-horizon optimal control
regulate $x(k)$
$V^{\infin}(x(0)) =\min\limits_{u(0),u(1),\cdots} \sum\limits_{k=0}^{\infin}x’(k)Qx(k)+u’(k)Ru(k)$
Results:
exists a unique solution $P_∞$ of the algebraic Riccati equation (ARE):
$P_∞ = A′P_∞A + Q − A′P_∞B(B′P_∞B + R)^{−1}B′P_∞A$
such that the optimal cost is $V_∞(x(0)) = x′(0)P_∞x(0)$ and the optimal control law is the constant linear state feedback $u(k) = K_{LQR} x(k)$ with $K_{LQR} = −(R + B′P_∞B)^{−1} B′P_∞A$
1 | P_∞ = dare(A,B,Q,R) % Solve discrete-time algebraic Riccati equations. |
Closed-loop stability is ensured if (A,B) is stabilizable, $R≻0$, $Q⪰0$ , and $(A, Q^{\frac{1}{2}})$ is detectable, where $Q^{\frac{1}{2}}$ is the Cholesky factor of $Q$.
In fact, Given a matrix $Q=Q′⪰0$ , its Cholesky factor is an upper-triangular matrix $C$ such that $C′C = Q$ (MATLAB:chol)
LQR is an automatic and optimal way of placing poles!
regulate $y(k)$
if we just want regulate $y(k)=Cx(k)$, rather than $x(k)$ , to zero, so
$V^{\infin}(x(0))=\min\limits_{u(0),u(1),\cdots} \sum\limits_{k=0}^{\infin}y'(k)Qy(k)+u'(k)Ru(k)$
so, similarly, this problem is again an LQR problem with equivalent state weight $Q=C’Q_y C$
MATLAB:
[-K_∞,P_∞,E] = dlqry(sysd,Qy,R)
Corollary:
Let $(A, B)$ stabilizable, $(A, C)$ detectable, $R > 0$, $Q_y > $0. The LQR control law $u(k) = K_{LQR} x(k)$ the asymptotically stabilizes the closed-loop system
$\lim\limits_{t \rightarrow \infin}x(t)=0$, $\lim\limits_{t \rightarrow \infin} u(t)=0$
Idea:
Intuitively: the minimum cost $x′(0)P_∞x(0)$ is finite ⇒ $y(k) → 0$ and $u(k) → 0$.
$y(k) → 0$ implies that the observable part of the state $→ 0$. As $u(k) → 0$, the unobservable states
remain undriven and go to zero spontaneously (=detectability condition)
Kalman filtering
Assign observer poles in an optimal way, that is to minimize the state estimation error $\tilde{x} = x − \hat{x}$
##Model:
linear time-varying system with noise
$\begin{align} x(k+1)&=A(k)x(k)+B(k)u(k)+G(k)\xi(k) \ y(k)&=C(k)x(k)+D(k)u(k)+\zeta(k)\ x(0)&=x_0 \end{align}$
$\xi(k)$ Process noise, assume $\mathbb{E}(\xi(k))=0$ [zero means]; $\mathbb{E}(\xi(k) \xi’(j))=0$ [white noise]; and $\mathbb{E}(\xi(k) \xi’(k))=Q(k)$ [covariance matrix]
$\zeta(k)$ measurement noise; assume $\mathbb{E}(\zeta(k))=0$ [zero means]; $\mathbb{E}(\zeta(k) \zeta’(j))=0$ [white noise]; and $\mathbb{E}(\zeta(k) \zeta’(k))=R(k)$ [covariance matrix]
$x_0$ random vector; $\mathbb{E}(x_0)=\bar{x}_0$ ; $\mathbb{E}[(x_0-\bar{x}_0)(x_0-\bar{x}_0)’]=Var[x_0]=P_0$
Vectors $\xi(k)$ , $\zeta(k)$ , $x_0$ are uncorrelated: $\mathbb{E}(\xi(k) \zeta’(j))=0$ ; and $\mathbb{E}(\xi(k) x’_0)=0$; $\mathbb{E}(\zeta(k) x’_0)=0$
Probability distributions: assume normal (Gaussian) distributions
$\xi(k)\sim\mathcal{N}(0,Q(k))$ , $\xi(k) \sim \mathcal{N}(0,R(k))$ , $x_0 \sim \mathcal(\bar{x}_0,P_0)$
##Steps:
The Kalman filter provides the optimal estimate $\hat{x}(k|k)$ of $x(k)$ given the measurements up to time $k$. Optimality means that the trace of the variance $P(k+1|k)$ is minimized.
Step1: measurement update
Measurement update based on the most recent $y(k)$
$M(k)=P(k|k-1) C(k)’ [ C(k) P(k|k-1) C(k)’+ R(k) ]^{-1}$
$\hat{x}(k|k)=\hat{x}(k|k-1)+M(k) [ y(k)- C(k) \hat{x}(k|k-1) -D(k)u(k) ]$
$P(k|k)=( I - M(k) C(k)) P(k|k-1)$
With initial conditions $\hat{x}(0|-1)=\hat{x}_0, P(0|-1)=P_0$
Step 2: Time update
time update based on the model of the system
$\hat{x}(k+1|k)=A\hat{x}(k|k)+Bu(k)$
$P(k+1|k)=A(k)P(k|k)A(k)’ + G(k)Q(k)G(k)’$
Stationary Kalman Filter
assume $A,C,G,Q,R$ are constant
$P(k|k-1), M(k)$ converge to the constant matrices
$P_{\infin} = AP_{\infin}A’ + GQG’ -AP_{\infin}C’[CP_{\infin}C’+R]^{-1} CP_{\infin}A’$
$\ M\ =P_{\infin} C’(CP_{\infin}C’+R)^{-1}$
By setting $L=AM$, the dynamics of the prediction $\hat{x}(k|k-1)$ becomes the luenberger observer
$\hat{x}(k+1|k)=A\hat{x}(k|k-1)+B(k)u(k)+L(y(k)-C\hat{x}(k|k-1)-D(k)u(k))$
With all the eigenvalues of $(A-LC)$ inside the unit circle
MATLAB:
[~,L,P_∞,M,Z]=kalman(sys,Q,R),
where $Z=\mathbb{E}[\tilde{x}(k|k)\tilde{x}(k|k)’]$
Tuning Kalman Filters
- It is usually hard to quantify exactly the correct values of $Q$ and $R$ for a given process
- The diagonal terms of $R$ are related to how noisy are output sensors
- $Q$ is harder to relate to physical noise, it mainly relates to how rough is the $(A, B)$ model
- After all, $Q$ and $R$ are the tuning knobs of the observer (similar to LQR)
- The “larger” is $R$ with respect to $Q$, the “slower” is the observer to converge ($L$, $M$ will be small)
- On the contrary, the “smaller” is $R$ than $Q$, the more precise are considered the measurments, and the “faster” observer will be to converge
Extended Kalman Filter
extended to nonlinear system:
Model:
nonlinear model:
$\begin{align} x(k+1)&=f(x(k),u(k),\xi(k)) \ y(k)&=g(x(k),u(k))+\zeta(k) \end{align}$
Steps:
Measurement update:
$$\begin{align} C(k)&=\frac{\partial{g}}{\partial{x}} (\hat{x}_{k|k-1},u(k)) \ M(k)&=P(k|k-1)C(k)’[C(k)R(k|k-1)C(k)’+R(k)]^{-1}\ \hat{x}(k|k)&=\hat{x}(k|k-1)+M(k)(y(k)-g(\hat{x}(k|k-1),u(k))) \ P(k|k)&=(I-M(k)C(k))P(k|k-1) \end{align}$$
Time update:
$$\begin{align} \hat{x}(k+1|k)&=f(\hat{x}(k|k),u(k)),& \ \ \ \hat{x}(0|-1)=\hat{x}0 \ A(k)&=\frac{\partial f}{\partial x}(\hat{x}{k|k},u{k},\mathbb{E}[\xi(k)]), &\ \ G(k)=\frac{\partial f}{\partial \xi}(\hat{x}_{k|k},u(k),\mathbb{E}[\xi(k)]) \ P(k+1|k)&=A(k)P(k|k)A(k)’ + G(k)Q(k)G(k)’, &\ \ P(0|-1)=P_0 \end{align}$$
- The EKF is in general not optimal and may even diverge, due to linearisation. But is the de-facto standard in nonlinear state estimation
LQG
LQG (Linear Quadratic Gaussian) control: LQR control + stationary Kalman predictor/filter
Model:
- stochastic dynamic system
$\begin{align} x(k+1)&=Ax(k)+Bu(k)+\xi(k),& \ \ w\sim\mathcal{N}(0,Q_{KF}) \ y(k)&=Cx(k)+\zeta(k),& \ \ v\sim\mathcal{N}(0,R_KF) \end{align}$
with initial condition $x(0)=x_0, x_0\sim\mathcal{N}(\bar{x}0,P_0), P,Q{KF}\succeq0, R_{KF}>0$ , and $\xi$ and $\zeta$ are independent and white noise terms
- objective is to minimize the cost function:
$J(x(0),U)=\lim\limits_{T\rightarrow \infin} \frac{1}{T} \mathbb{E} \left[ \sum\limits_{k=0}^{T} x’(k)Q_{LQ}x(k)+u’()k R_{LQ}U(K)\right]$
when the state $x$ is not measurable
control
- the pair $(A,B)$ is reachable and the pair $(A,C_q)$ with $C_q$ such that $Q_{LQ} =C_qC_q′$ is observable (here $Q$ is the weight matrix of the LQ controller)
- the pair $(A, B_q)$, with $B_q$ s.t. $Q_{KF} = B_qB_q′$ , is stabilizable, and the pair $(A, C)$ is observable (here $Q$ is the covariance matrix of the Kalman predictor/filter)
Then,
- Determine the optimal stationary Kalman predictor/filter, neglecting the fact that the control variable $u$ is generated through a closed-loop control scheme, and find the optimal gain $L_KF$
- Determine the optimal $L_{QR}$ strategy assuming the state accessible, and find the optimal gain $K_{LQR}$
Analogously to the case of output feedback control using a Luenberger observer, it is possible to show that the extended state $[x′ \ \ \tilde{x}’]’$ has eigenvalues equal to the eigenvalues of $(A + BK_{LQR})$ plus those of $(A − L_{KF} C)$ (2n in total)