7. Nonlinear Conic Programming

Posted on 2021-08-16 In Nonlinear Optimization

Description:

Conic optimization is a subfield of convex optimization that studies problems consisting of minimizing a convex function over the intersection of an affine subspace and a convex cone.

The class of conic optimization problems includes some of the most well known classes of convex optimization problems, namely linear and semidefinite programming.

Certain special cases of conic optimization problems have notable closed-form expressions of their dual problems.

7.1 Nonlinear conic programming

Optimization problem:
$$
\begin{array}{cl} \text{(OP)} &
\begin{array}{cl}
{\min} & {f(x)} \ {\text {s.t.}} & {g(x)=0} \ {} & {h(X) \in \mathcal C}
\end{array}
\end{array}
$$
where $f: \mathcal X \rightarrow \mathbb R$, $g: \mathcal X \rightarrow \mathcal U$ and $h: \mathcal X \rightarrow \mathcal U$ and $h:= \mathcal X \rightarrow \mathcal V$ are continuous functions. $\mathcal X, \mathcal U, \mathcal V$ are finite dimensional Euclidean spaces each equipped with a scalar product $\langle \cdot, \cdot \rangle$ and its induced norm $| \cdot |$, and $C\in V$ is a closed convex set.

Example :

NLP is one kind of OP

$\mathcal X = \mathbb R^N, \mathcal U = \mathbb R^m, \mathcal V = \mathbb R^p, \mathcal C = \mathbb R_+^p$
$$
\begin{array}{cl} \text{(NLP)} &
\begin{array}{cl}
{\min} & {f(x)} \ {\text {s.t.}} & {g(x)=0} \ {} & {h(x)\geq 0}
\end{array}
\end{array}
$$

==Linear SemiDefinite Programming==
$$
\begin{array}{cl} \text{(SDP)} &
\begin{array}{cl}
{\min} & {\langle C, X\rangle} \ {\text {s.t.}} & {\mathcal{A}(X)-b=0} \ {} & {X \succeq 0 \quad\left(\equiv X \in \mathbb{S}{+}^{n}\right)}
\end{array}
\end{array}
$$
$\mathcal X = \mathbb S^n$, the real Euclidean space of $n\times n$ real symmetric matrices with the standard trace inner product $\langle X,Y \rangle = \operatorname{tr} (X^T Y)$ and its induced norm $| X | = \sqrt{\langle X, X \rangle}$. $\mathcal C = \mathbb S+^n$ (the cone of positive semidefinite matrices). The linear map $\mathcal A(X) = \begin{bmatrix} \langle A_1, X \rangle \ \vdots \ \langle A_m, X \rangle \end{bmatrix}$.

Let $\mathcal A^* : \mathbb R^m \rightarrow \mathbb S^n$ be adjoint of $\mathcal A$ as $\mathcal A^* y = \sum\limits_{k=1}^m y_kA_k, y\in \mathbb R^m$.
$$
\langle \mathcal A^* y, X \rangle = \langle y, \mathcal A X \rangle
$$
if $\mathcal A(X) = \operatorname{diag} (X)$, then $A_k = E_{kk}$, and $\mathcal A^* y = \operatorname{diag} (y)$.

Let $G \in \mathbb R^{p\times n}$ and $H\in \mathbb R^{q\times n}$. For the linear map $\mathcal A: \mathbb S^m \rightarrow \mathbb R^{p \times q}$ defined by $\mathcal A(X) = GXH^T$ . Show that the adjoint map $\mathcal A^*: \mathbb R^{p \times q} \rightarrow \mathbb S^n$ is given by
$$
\mathcal A^*(Y) = \frac{1}{2} \left( G^TYH+H^TY^TG \right)
$$
Solution:

for $Y\in \mathbb R^{n\times n}$, we can have
$$
\begin {aligned}
\langle Y, \mathcal A(X) \rangle & = \operatorname{tr} (Y^T \mathcal A(X)) \
& = \operatorname{tr} (Y^TGXH^T) \
& = \operatorname{tr} (H^TY^TGX) \
& = \frac{1}{2} \operatorname{tr} ( (H^TY^TG + G^TYH)X ) \
& = \langle X, \underbrace{\frac{1}{2} (H^TY^TG + G^TYH)}_{\text{symmetric}} \rangle
\end{aligned}
$$
because
$$
\begin{aligned}
\operatorname{tr} (H^TY^TGX) & = \operatorname{tr} (XH^TY^TG) \
& = \operatorname{tr} (G^TYHX^T)\
& = \operatorname{tr} (G^TYHX) \quad X\text{ is symmetric}\
\end{aligned}
$$

Semidefinite least squares problem

if $f(X) = \frac{1}{2} | X-B |^2$ where $B\in \mathbb S^n$ is a given matrix,
$$
\begin {array}
\text{(SDPLS)} & \min\left{ \frac{1}{2} | X-B |^2 \ \vert \ \mathcal A(X)=b, X \succeq 0 \right}
\end{array}
$$

Let $\mathcal Y := \mathcal U \times \mathcal V$, $\mathcal K := {0^\mathcal U} \times C \subset \mathcal Y$. Define $G: \mathcal X \rightarrow \mathcal Y$ by $G(x):= (g(x), h(x)), x\in \mathcal X$.

7.1.1 COP problem

==Then rewrite the OP into more compact form COP==
$$
\begin{array}{cl} \text{(COP)} &
\begin{array}{cl}
{\min} & {f(x)} \ {\text {s.t.}} & {G(x)\in \mathcal K} \
\end{array}
\end{array}
$$

Lagrangian function $L : \mathcal X \times \mathcal Y \rightarrow \mathbb R$ for COP by
$$
L(x,\mu) = f(x) - \langle \mu, G(x) \rangle, x\in \mathcal X, \mu\in \mathcal Y \
\theta(\mu) := \inf { L(x,\mu) | x\in \mathcal X }
$$

==Dual Problem of COP==:
$$
\begin{array}{cl}
{\text{(COD)}} & \max\left{ \inf \theta(\mu) | \mu \in \mathcal K^* \right}
\end{array}
$$
where $\mathcal K^* = \mathcal U \times \mathcal C^*$ is the dual cone of $\mathcal K$. Because $({ 0^\mathcal U })^* = \mathcal U$ and $\mathbb S_+^n$ is self-dual.

We can see in the conic programming, we only consider $μ∈\mathcal K^*$ in the dual problem. However, if needed, we can also put this constraint to the Lagrangian dual function and Lagrangian function like what we usually do in the basic nonlinear programming.

We also define the Lagrangian function in the conic programming as $L(x,μ)=f(x)$ minus $⟨μ,G(x)⟩$ instead of plus. This is for convenience because the cone is usually defined as “positive”, which means $h(x)⪰0$ in the conic programming instead of $h(x)≤0$ in the basic nonlinear programming is the common case.

NLP

$\mathcal K = { 0^m } \times \mathbb R_+^p$, and $\mathcal K^* =\mathbb R^m \times \mathbb R_+^p$. The dual problem is
$$
\max { \theta(\lambda,\rho) \ \vert \ (\lambda, \rho)\in \mathcal K^* = \mathbb R^m \times \mathbb R_+^n } \
\theta (\lambda,\rho) = \inf { f(x)- \langle (\lambda,\rho),(g(x),h(x)) \rangle \ \vert \ x\in \mathbb R^n }
$$

linear SDP

$\mathcal K = {0^m} \times \mathbb S_+^n$. For $X\in \mathbb S^n$, $(y,Z) \in \mathcal y := \mathbb R^m \times \mathbb S^n$.
$$
\begin{aligned}
L(X,y,Z) & = \langle C,X \rangle - \langle (y,Z), (\mathcal A(X)-b, X) \rangle \
& = \langle C,X \rangle - \langle y, \mathcal A(X)-b\rangle - \langle Z,X \rangle\
& = \langle C,X \rangle - \langle \mathcal A^* y, X \rangle + \langle b, y \rangle - \langle Z,X \rangle \
& = \langle b,y \rangle + \langle C - \mathcal A^* y -Z, X \rangle
\end{aligned}
$$

$$
\begin{aligned}
\theta (y,Z) &= \inf { L(X,y, Z) \ \vert \ X\in \mathbb S^n } \
&= \inf { \langle b,y \rangle + \langle C - \mathcal A^* y -Z, X \rangle \ \vert \ X\in \mathbb S^n } \
& = \begin{cases}
\langle b,y \rangle \quad \text{if } C-\mathcal A^* y - Z = 0 \
-\infty \quad \text{otherwise}
\end{cases}
\end{aligned}
$$

Dual problem of SDP is
$$
\max {\theta (y,Z) \ \vert \ (y,Z)\in \mathcal K^* } = \max \left{ \langle b,y \rangle \ \vert \ C-\mathcal A^* y - Z = 0, y\in \mathbb R^m, Z\succeq0 \right}
$$

SDPLS

$\mathcal K = {0^m}\times \mathbb S_+^n$. For $X\in \mathbb S^n$, $(y,Z)\in \mathcal Y:=\mathbb R^m \times \mathbb S^n$
$$
\begin{aligned}
L(X,y,Z) = \frac{1}{2} | X-B |^2 + \langle y,b-\mathcal A(X) \rangle + \langle Z,-X \rangle \
= \langle b,y \rangle + \frac{1}{2} | B|^2 + \frac{1}{2} |X |^2 - \langle B + \mathcal A^* y +Z, X \rangle
\end{aligned}
$$
thus, $\min { L(X,y,Z) \vert X\in \mathbb S^n }$ can be found by
$$
\nabla_XL(X,y,Z) = X-(B + \mathcal A^* y +Z) = 0
$$
substituting, get
$$
\begin{aligned}
\theta(y,Z) := \min{ L(X,y,Z) | X\in \mathbb S^n } \
= \langle b,y \rangle + \frac{1}{2} | B|^2 - \frac{1}{2} |X |^2 \
= \langle b,y \rangle + \frac{1}{2} | B|^2 - \frac{1}{2} | B + \mathcal A^* y +Z |^2
\end{aligned}
$$
Dual problem is
$$
\begin{aligned}
& \quad \ \max \left{ \theta(y,Z) | (y,Z)\in \mathcal K^* \right} \
&= \max \left{ \langle b,y \rangle - \frac{1}{2} | B + \mathcal A^* y +Z |^2 | y\in \mathbb R^m, Z\in \mathbb S_+^n \right} + \frac{1}{2} | B|^2 \
&= \min \left{ \langle -b,y \rangle + \frac{1}{2} | B + \mathcal A^* y +Z |^2 | y\in \mathbb R^m, Z\in \mathbb S_+^n \right}\
&= \min\limits_{y\in \mathbb R^m} \langle -b,y \rangle + \min \left{ \frac{1}{2} | B + \mathcal A^* y +Z |^2 | Z\in \mathbb S_+^n \right}\
&= \min\limits_{y\in \mathbb R^m} \phi(y):= \langle -b,y \rangle + \frac{1}{2} | \Pi_{\mathbb S_+^n} (B+A^* y) |^2\
\end{aligned}
$$
because
$$
\begin{aligned}
\text{set } G = B + \mathcal A^* y +Z, & \
& G = \Pi_{\mathbb S_+^n}(G) - \Pi_{\mathbb S_+^n} (-G) \
\Rightarrow \ &G + \Pi_{\mathbb S_+^n} (-G) = \Pi_{\mathbb S_+^n} (G) \
\text{set } Z =\Pi_{\mathbb S_+^n} (-G), \quad \quad &\
\therefore & \ B + \mathcal A^* y +Z = \Pi_{\mathbb S_+^n} (B+A^* y)
\end{aligned}
$$
This dual problem is unconstrained problem, we can solve it by finding the root of the gradient
$$
0=\nabla\phi(y) = -b+\mathcal A \Pi_{\mathcal S_+^n} (B+\mathcal A^*y)
$$

Sparse Regression problem:
$$
\min \left{\frac{1}{2} | Ax-b |^2 + \lambda | x |_1 | x\in \mathbb R^n \right}
$$
Let $u=b-Ax$, rewrite as
$$
\begin{array}{cl} \text{(P)} &
\begin{array}{cl}
{\min} & {f(x)+g(x)} \ {\text {s.t.}} & {u+Ax=b} \
\end{array}
\end{array}
$$
where $f(x) = \frac{1}{2} |u|^2$ and $g(x) = \lambda |x |_1$
$$
\begin{aligned}
L(u,x;\xi) &= f(u) + g(x) + \langle \xi, b-u-Ax \rangle \
&= f(u) - \langle \xi,u \rangle + g(x) - \langle A^T\xi, x \rangle + \langle \xi,b \rangle
\end{aligned}
$$

The dual function is
$$
\begin{aligned}
\min\limits_{u,x} L(u,x;\xi) & = \langle \xi,b \rangle + \min\limits_u \left{ f(u) - \langle \xi,u \rangle \right} + \min\limits_x \left{ g(x) - \langle A^T\xi, x \rangle \right} \
& = \langle \xi,b \rangle - \frac{1}{2} |u|^2 + \min\limits_x \left{ g(x) - \langle A^T\xi, x \rangle \right} \
& = \langle \xi,b \rangle - \frac{1}{2} |u|^2 + \delta_{B_\lambda}(A^T\xi)
\end{aligned}
$$
where $B_\lambda = { v\in \mathbb R^n \ \vert \ | v|_\infty\leq\lambda }$. (reason refer to Chapter 3.12 Fenchel conjugate)

The dual problem is
$$
\begin{aligned}
\max\limits_{\xi \in \mathbb R^m} \left{ \min\limits_{u,x} L(u,x;\xi) \right} &= \max\limits_{\xi \in \mathbb R^m} \langle \xi,b \rangle - \frac{1}{2} |u|^2 + \delta_{B_\lambda}(A^T\xi) \
&= - \min\limits_{\xi \in \mathbb R^m} \langle \xi,-b \rangle + \frac{1}{2} |u|^2 - \delta_{B_\lambda}(A^T\xi)
\end{aligned}
$$

7.1.2 Langrage Multiplier & KKT:

$\bar \mu \in \mathcal Y$ is a langrage multiplier at feasible point $\bar x$ ($G(\bar x) \in \mathcal K$), if it satisfiy the KKT condition
$$
0=\nabla_x L(\bar x, \bar \mu) = \nabla f(\bar x) - \nabla G(\bar x) \bar\mu, \quad 0\in \bar\mu+\mathcal N_\mathcal K (G(\bar x))
$$
or
$$
0=\nabla_x L(\bar x, \bar \mu) = \nabla f(\bar x) - \nabla G(\bar x) \bar\mu, \quad \mathcal K \ni G(\bar x) \perp \bar \mu \in \mathcal K^*
$$

because
$$
\begin{aligned}
0\in \bar\mu+\mathcal N_\mathcal K (G(\bar x)) &\iff -\bar \mu \in \mathcal N_\mathcal K (G(\bar x))\
&\iff G(\bar x)\in \mathcal K, \quad \langle -\bar\mu, d-G(\bar x) \rangle \leq 0, \quad \forall d\in \mathcal K \
&\iff G(\bar x)\in \mathcal K, \quad \langle \bar\mu, G(\bar x) \rangle = 0, \quad \langle \bar\mu, d \rangle \geq 0, \quad \forall d\in \mathcal K \ (\text{set }d=2G(\bar x) \text{ and } d=0)\
&\iff G(\bar x)\in \mathcal K, \quad \langle \bar\mu, G(\bar x) \rangle = 0, \bar\mu\in\mathcal K^* \ (\because \text{definition of dual cone})\
&\iff \mathcal K \ni G(\bar x) \perp \bar \mu \in \mathcal K^*
\end{aligned}
$$

$\mathcal M(\bar x)$ maybe empty or unbounded.

Example:

NLP:

Let $\bar\mu = (\lambda,\rho)$, $G(x) = (g(x),h(x))$, $\mathcal C=\mathbb R_+^p$, $\mathcal K={0_m}\times\mathcal C$. KKT conditions:
$$
\nabla f(\bar x) - \nabla g(\bar x)\lambda - \nabla h(\bar x)\rho = 0, G(\bar x) = (g(\bar x), h(\bar x)), \langle h(\bar x),\rho \rangle = 0, \rho\in\mathcal C^*=\mathbb R_+^p
$$

DNN projection

problem:
$$
\min \left{ \frac{1}{2}|X-B|^2 \ \vert \ X\in \mathbb S_+^n \cap \mathcal N^n \right}
$$
Rewritten it as
$$
\min \left{ f(X) := \frac{1}{2}|X-B|^2 \ \vert \ G(X)=(X,X)\in \mathcal K := \mathbb S_+^n \times \mathcal N^n \right}
$$
where $\mathcal N^n$ denote the cone of $n\in \mathcal N^n$ symmetric and element wise nonnegative matrices.

Lagrange function:
$$
\begin{aligned}
L(X;S,Z) & = \frac{1}{2}|X-B|^2 + \langle (S,Z), (X,X) \rangle \ \vert \ (S,Z)\in \mathcal K^* := \mathbb S_+^n \times \mathcal N^n \
\theta(S,Z) & = \inf \left{ \frac{1}{2}|X-B|^2 + \langle (S,Z), (X,X) \rangle \ \vert \ (S,Z)\in \mathcal K^* \right}
\end{aligned}
$$
dual problem:
$$
\max \left{ \theta(S,Z) \ \vert \ (S,Z)\in \mathcal K^* \right}
$$
KKT condition:
$$
\nabla f(X) - \nabla G(X)(S,Z) = 0, \ (S,Z)\in \mathcal K^* \
\iff X-B-S-Z = 0, S \in \mathbb S_+^n, Z\in \mathcal N^n
$$
Rewrite Dual Problem:
$$
\begin{aligned}
\max \left{ \theta(S,Z) \ \vert \ (S,Z)\in \mathcal K^* \right} &= \max \left{ \theta(S,Z) \ \vert \ X-B-S-Z = 0,(S,Z)\in \mathcal K^* \right} \
&= \max \left{ \frac{1}{2}|S+Z|^2 + \langle (S,Z), (B+S+Z,B+S+Z) \rangle \vert \ (S,Z)\in \mathcal K^* \right}\
&= \max \left{ \frac{1}{2}|S+Z|^2 + \langle S, (B+S+Z) \rangle + \langle Z, (B+S+Z) \rangle \vert \ (S,Z)\in \mathcal K^* \right}\
&= \max \left{ \frac{1}{2}|S+Z|^2 + |S|^2 + |Z|^2+ 2\langle S, B \rangle + \langle (S+Z),B \rangle \vert \ (S,Z)\in \mathcal K^* \right}\
\end{aligned}
$$

7.1.3 Robinson’s constraint qualification:

conditions $\longrightarrow$ KKT condition to be necessary for a local minimizer $\bar x$.

conditions is to ensure the existence of Lagrange multipliers and the boundedness of $\mathcal M(\bar x)$.

Robinson’s condition generalize the LICQ.

Definition

$\bar x$ is feasible solution, Robinson’s CQ means
$$
0 \in \operatorname{int} \left( G(\bar x) + G’(\bar x) \mathcal X -\mathcal K \right)
$$

example:

when $\mathcal X=\mathbb R^n, G(x)=g(x), \mathcal K={\mathbf 0_m}$, the Robinson’s CQ reduces to $0\in \mathrm{int}(G’(\bar x) \mathbb R^n)$, which is equivalent to $\mathrm{Range}(G’(\bar x))=\mathbb R^m \iff G’(\bar x)^T$ is 1-1.

For NLP, the Robinson’s CQ can be written as
$$
0\in\mathrm {int} \left( \begin{pmatrix} g(\bar x) \ h(\bar x)\end{pmatrix} + \begin{pmatrix} g’(\bar x) \ h’(\bar x)\end{pmatrix}\mathbb R^n - \begin{pmatrix} {\mathbf 0_m} \ \mathbb R_+^p \end{pmatrix} \right)
$$
==MFCQ==:
$$
\begin{array}{l}{\left{\nabla g_{i}(\bar{x}) | i=1, \ldots, m\right} \quad \text { are linearly independent }} \ {\exists d \in \mathbb{R}^{n} \text { s.t. } \quad\left\langle\nabla g_{i}(\bar{x}), d\right\rangle= 0 \quad \forall i=1, \ldots, m, \quad\left\langle\nabla h_{j}(\bar{x}), d\right\rangle> 0 \quad \forall j \in J(\bar{x})} \ {B_{g}(\bar{x})=\left[\nabla g_{1}(\bar{x}), \ldots, \nabla g_{m}(\bar{x})\right], \quad B_{h}(\bar{x})=\left[\nabla h_{j}(\bar{x}): j \in J(\bar{x})\right]} \ {B(\bar{x})=\left[B_{g}(\bar{x}), B_{h}(\bar{x})\right] \in \mathbb{R}^{n \times(m+|J|)}}\end{array}
$$
This is equivalent to saying that $Bg(\bar x)$ has full column rank and the systems
$$
B_g(\bar x)^Td = \mathbf 0_m, B_h(\bar x)^Td >0
$$
has a solution.

In the case when $\bar x$ satisfies the LICQ, then $B(\bar x)$ has full column rank and hence $B(\bar x)^T$ full row rank. In this case, $\mathrm{Range}(B(\bar x)^T ) = \mathbb R^n$ and it is clear that the linear system $B(\bar x)T d = \mathrm{rhs}$ has a solution for any given vector $\mathrm{rhs}$. In particular, for $\mathrm{rhs} = [\mathbf 0_m; I_{|J|}]$.

Proposition:

Assume $G(\bar x)\in \mathcal K$. The following conditions are equivalent to each other and to Robinson’s CQ:
$$
\begin{array}{l}{\text { (a) } \quad G^{\prime}(\bar{x}) \mathcal{X}+T_{\mathcal{K}}(G(\bar{x}))=\mathcal{Y}} \ {\text { (b) } \quad\left[G^{\prime}(\bar{x}) \mathcal{X}\right]^{\perp} \cap \quad N_{\mathcal{K}}(G(\bar{x}))={0}}\end{array}
$$
For a set $S$ in $\mathcal Y$ and a point $y \in S$, the tangent cone to $S$ at $y$ is defined by
$$
\begin{aligned} T_{S}(y) &=\left{d \in \mathcal{Y} | \exists y^{k} \in S \rightarrow y, t_{k} \downarrow 0, \text { s.t. }\left(y^{k}-y\right) / t^{k} \rightarrow d\right} \ &=\left{d \in \mathcal{Y} | \exists t_{k} \downarrow 0, \operatorname{dist}\left(y+t_{k} d, S\right)=o\left(t_{k}\right)\right} \end{aligned}
$$
Note that if $S$ is nonempty convex subset of $\mathcal Y$, then
$$
T_s(y) = \mathrm{closure}({ t(x-y) | x\in S, t\ge 0 })
$$
Remark:

for $\mathcal C = \mathbb R_+^p, y\in \mathcal C$. Let $J(y)={j|y_j=0}$. We have that
$$
T_C(y) = { d\in \mathbb R^p | d_j\ge 0, j\in J(y) }
$$
For NLP, above proposition that Robinson’s CQ is equivalent to the following form:
$$
\left(\begin{array}{c}{g^{\prime}(\bar{x})} \ {h^{\prime}(\bar{x})}\end{array}\right) \mathbb{R}^{n}+\left(\begin{array}{c}{\left{0_{m}\right}} \ {T_{\mathbb{R}_{+}^{p}}(h(\bar{x}))}\end{array}\right)=\left(\begin{array}{c}{\mathbb{R}^{m}} \ {\mathbb{R}^{p}}\end{array}\right)
$$

Proposition:

Assume that $G(\bar x) \in \mathcal K$. If $\mathcal K$ has a nonempty interior, then Robinson’s CQ is equivalent to
$$
\exist d \in \mathcal X \quad \text{s.t.} \quad G(\bar x) + G’(\bar x)d \in \mathrm{int}(\mathcal K)
$$
Proposition:

$G(\bar x)=(g(x),h(x)), \mathcal K={\mathbf 0_m}\times \mathcal C, G(\bar x)\in \mathcal K$. If $C$ has a nonempty interior, then Robinson’s CQ is equivalent to
$$
\left{\begin{array}{l}{g^{\prime}(\bar{x}) \text { is onto; }} \ {\exists d \in \mathcal{X} \quad \text { such that } \quad g^{\prime}(\bar{x}) d=0, h(\bar{x})+h^{\prime}(\bar{x}) d \in \operatorname{int}(\mathcal{C})}\end{array}\right.
$$

7.2 First order necessary conditions

Proposition:

$\bar x$ is locally optimal solution of COP. Then, the point $d=0$ is an optimal solution of the following problem:
$$
\min_{d\in \mathcal X} f’(\bar x)d \
\text{s.t. } d\in T_\mathcal F(\bar x)
$$
where $\mathcal F = {x\in \mathcal X | G(x)\in \mathcal K}$ is the feasible region of COP.

Lemma:

If Robinson’s CQ holds at a feasible point of $\bar x$ of COP, then there exists a constant $c>0$ such that
$$
\mathrm {dist}(x,G^{-1}(\mathcal K + y)) \le c \ \mathrm{dist}(G(x)-y, \mathcal K)
$$
for all $(x,y)$ in an open neighborhood of $(\bar x,0)$.

In particular, when $y=0$, we have for all $x$ in a neighborhood of $\bar x$,
$$
\mathrm {dist}(x,\mathcal F) = O\ \mathrm{dist}(G(x), \mathcal K)
$$

This lemma is very important because it can be used to measure the distance from the current point to the feasible point. For example, if we have the constraint $Ax=b$, it is difficult to know the distance $\mathrm{dist}(x^c,\mathcal F)$, but it is easy to calculate $||b−Ax^c||$. Since $\mathrm{dist}(x^c,\mathcal F)$ is bounded by $||b−Ax^c||$, we can use $||b−Ax^c||$ to estimate the distance.
One has
$$
0\in \mathrm{int}(G(\mathcal X)-\mathcal K)
$$

Proposition:

$\bar x$ is a locally optimal solution of COP. If Robinsons’ CQ holds at $\bar x$, then it holds that
$$
T_\mathcal F(\bar x) = { d\in \mathcal X | G’(\bar x)d\in T_\mathcal K(G(\bar x)) }
$$
Hence $d=0$ is an optimal solution of the following problem
$$
\text{(LCOP)} \quad \min_{d\in \mathcal X} { f’(\bar x)d \ | \ G’(\bar x)d \in T_{\mathcal K} (\bar x) }
$$
Theorem: (Important)

Suppose that $\bar x$ is a locally optimal solution of (COP). Then, $\mathcal M(\bar x)$ is a nonempty, convex compact subset of $\mathcal Y$ if and only if Robinson’s CQ holds at $\bar x$.

Proposition (Uniqueness of Lagrange Multipliers)

Suppose that $\bar x$ is a locally optimal solution to (COP) and $\mathcal M(\bar x) \neq 0$. Let $\mu_0 \in \mathcal M(\bar x)$.

$\mathcal M(\bar x)$ is a singleton if and only if
$$
\left[G^{\prime}(\bar{x}) \mathcal{X}\right]^{\perp} \cap R_{D}\left(-\mu_{0}\right)={0}
$$
where $D = N_K(G(\bar x))$.
Then, the following condition is sufficient for uniqueness of $\mu_0$:
$$
\left[G^{\prime}(\bar{x}) \mathcal{X}\right]+\left(T_{\mathcal{K}}(G(\bar{x})) \cap\left[-\mu_{0}\right]^{\perp}\right)=\mathcal{Y}
$$