3. Convex Analysis

Posted on 2021-08-16 In Nonlinear Optimization

3.1 Convex set

To prove a given set C is convex

For any two arbitrary points $\mathbb x, \mathbb y$ in $C$, and any $λ ∈ [0, 1]$. Method is to argue that $λx + (1 − λ)y ∈ C$
If $C_1, C_2, \cdots, C_m$ are convex set, then $C= \cap_{i=1}^m C_i$ is also convex.

for any $m\times n$ matrix $\mathbf A$ and an m-column vector $\mathbb b$, the polyhedron or polyhedral set ${\mathbb x\in\mathbb R^n | \mathbf A \mathbb x \leq \mathbb b}$ is convex.

Suppose $D\subset \mathbb R^n$ is convex, If $f:D\rightarrow \mathbb R$ is convex, then for any $\alpha \in \mathbb R$, the set $S_\alpha = { \mathbb x \in D | f(\mathbb x) \le \alpha}$ is convex.
Epigraph of $f$ : $E_f={[\mathbb x; \alpha] ; \mathbb x\in D, \alpha \in \mathbb R, f(\mathbb x) \leq \alpha }$ is convex set $\iff$ $f$ is convex function.

3.2 Convex function

3.2.1 Prove convexity of a function

Definition:
- $f(\lambda \mathbb x + (1-\lambda)\mathbb y) \leq \lambda f(\mathbb x) + (1-\lambda) f(\mathbb y)$.
- if $f_1, f_2 : D\rightarrow \mathbb R$ are convex functions on a convex set $D \subset \mathbb R^n$, then $f_1 + f_2$ is convex. $\alpha f_1$ is convex if $\alpha >0$.
- Let $f_1, f_2, \cdots, f_k: D\rightarrow \mathbb R$ be convex on a convex set $D \subset \mathbb R^n$, then if $\alpha_j \ge 0, \forall j$, $f(\mathbb x)=\sum\limits_{j=1}^k \alpha_j f_j(\mathbb x)$ is convex,
- $h: D\rightarrow \mathbb R$ is convex. $g:\mathcal X \rightarrow \mathbb R$ is nondecreasing convex function. The composite function $f=g \circ h$ is convex.
  
  Objective function $f(\mathbb x) = \frac{1}{2} | A\mathbb x-\mathbb b |^2 + \lambda |\mathbb x |_1$ for sparse regression problem is convex.
Epigraph:
- Epigraph of $f$ : $E_f={[\mathbb x; \alpha] ; \mathbb x\in D, \alpha \in \mathbb R, f(\mathbb x) \leq \alpha }$ is convex set $\iff$ $f$ is convex function.
Jensen’s inequality:
- $f$ is convex function on convex set $S$, let $\mathbb x = \sum\limits_{j=1}^k \lambda_j \mathbb x^{(j)}$, where $\sum\limits_{j=1}^k \lambda_j =1, \lambda_j \ge 0$. Then
  $$
  f(\mathbb x) \leq \sum\limits_{j=1}^k \lambda_kf(\mathbb x^{(j)})
  $$
  When function is twice differentiable, some ways to check convexity.
Tangent plane characterization:
- $f(\mathbb x)$ has continuous first partial derivatives on an open convex set $S$. Then the function $f$ is convex $\iff$
  $$
  \underbrace{f(\mathbb x) + \nabla f(\mathbb x)^T (\mathbb y - \mathbb x)}_{\text{tangent plane}} \leq f(\mathbb y)
  $$
  means that the tangent plane is below the surface for a convex function.
- $f(\mathbb x)$ is differentiable on the open convex subset $S$. Then $f$ is convex on $S$ $\iff$
  $$
  \text{(monotone gradient condition)} \quad \langle \nabla f(\mathbb y)-\nabla f(\mathbb x), \mathbb y - \mathbb x \rangle \ge 0, \quad \forall \mathbb x, \mathbb y \in S
  $$
  
  (方向基本相同，在0-90之间)
Test for convexity of a differentiable function (second partial derivatives)
- $f(\mathbb x)$ has continuous second partial derivatives on an open convex set $D$ in $\mathbb R^n$.
  1. $f$ is convex on $D$ $\iff$ the Hessian matrix $H_f(\mathbb x)$ is positive semidefinite
  2. If $H_f(\mathbb x)$ is positive definite, then $f$ is strictly convex.

3.2.2 Equivalent statement of convex function

(i) $f(\mathbb{y}) \ge f(\mathbb{x})+\triangledown f(\mathbb{x})^T(\mathbb{y}-\mathbb{x}) $ Tangent plane characterization

(ii) $\langle \triangledown f(\mathbf{y}) - \triangledown f(\mathbf{x}), \mathbf{y}-\mathbf{x} \rangle \ge 0, \forall \mathbf{x}, \mathbf{y} \in S \quad \text{monotone gradient condition}$

(iv) $f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)$ convex

3.2.3 Strongly Convex Function

Strong Convex Function:
$$
f(\lambda \mathbf{x}+(1-\lambda) \mathbf{y}) \leq \lambda f(\mathbf{x})+(1-\lambda) f(\mathbf{y})-\frac{c}{2} \lambda(1-\lambda)|\mathbf{x}-\mathbf{y}|^{2}
$$
Theorem 3.3:

Suppose $f : D → R$ is a differentiable function. Then following statements are equivalent:

(i) ${f(\mathbf{y}) \geq f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle+\frac{c}{2}|\mathbf{y}-\mathbf{x}|^{2}, \quad \forall \mathbf{x}, \mathbf{y} \in D}$

(ii) ${g(\mathbf{x})=f(\mathbf{x})-\frac{c}{2}|\mathbf{x}|^{2}}$ is convex

(iii) ${\langle\nabla f(\mathbf{x})-\nabla f(\mathbf{y}), \mathbf{x}-\mathbf{y}\rangle \geq c|\mathbf{x}-\mathbf{y}|^{2}}$

(iv) $f$ is strongly convex.

Proof:

( (i) $\iff$ (ii) )

“ $\Longleftarrow$ “

If ${g(\mathbf{x})=f(\mathbf{x})-\frac{c}{2}|\mathbf{x}|^{2}}$ is convex, then we have
$$
g(\mathbb y) \geq g(\mathbb x) + \nabla g(\mathbb x)(\mathbb y - \mathbb x)\
\Rightarrow f(\mathbb y) -\frac{c}{2}|\mathbb{y}|^{2} \ge f(\mathbb{x})-\frac{c}{2}|\mathbb{x}|^{2} + (\nabla f(\mathbb x) - c \mathbb x)^T (\mathbb y - \mathbb x)\
\Rightarrow f(\mathbb y) \ge f(\mathbb x) + \nabla f(\mathbb x)^T(\mathbb y - \mathbb x) - \frac{c}{2}|\mathbb{x}|^{2} - c\mathbb x^T(\mathbb y - \mathbb x) + \frac{c}{2}|\mathbb{y}|^{2} \
\Rightarrow f(\mathbb y) \ge f(\mathbb x) + \nabla f(\mathbb x)^T(\mathbb y - \mathbb x) - \frac{c}{2} (|\mathbb{y}|^{2} - 2\mathbb x^T \mathbb y + |\mathbb{x}|^{2})\
\Rightarrow f(\mathbb y) \ge f(\mathbb x) + \langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle+\frac{c}{2}|\mathbf{y}-\mathbf{x}|^{2}
$$

“ $\Longrightarrow$ “

can easily proved by using tangent plane characterization.

( (ii) $\iff$ (iii) )

prove by monotone gradient condition

( (ii) $\iff$ (iv) )

proved by definition of convexicity

Let $\mathbb{x}^*$ be minimizer of $f$ over $D$
$$
\begin{array}{l}{\text { (a) } \frac{1}{2 c}|\nabla f(\mathrm{x})|^{2} \geq f(\mathrm{x})-f\left(\mathrm{x}^{*}\right)} \ {\text { (b) }|\nabla f(\mathrm{x})-\nabla f(\mathrm{y})| \geq c|\mathrm{x}-\mathrm{y}|} \ {\text { (c) } f(\mathrm{y}) \leq f(\mathrm{x})+\langle\nabla f(\mathrm{x}), \mathrm{y}-\mathrm{x}\rangle+\frac{1}{2 c}|\nabla f(\mathrm{y})-\nabla f(\mathrm{x})|^{2}} \ {\text { (d) }\langle\nabla f(\mathrm{x})-\nabla f(\mathrm{y}), \mathrm{x}-\mathrm{y}\rangle \leq \frac{1}{c}|\nabla f(\mathrm{x})-\nabla f(\mathrm{y})|^{2}}\end{array}
$$

Proof.

(a)

We already have $f(\mathbf{y}) \geq f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle+\frac{c}{2}|\mathbf{y}-\mathbf{x}|^{2}$. For a fixed $\mathbb x$, we can obtain the minimum value of the RHS, which is $f(\mathbf{x})+\langle\nabla f(\mathbf{x}), \mathbf{y}-\mathbf{x}\rangle+\frac{c}{2}|\mathbf{y}-\mathbf{x}|^{2} \ge f(\mathbb x) - \frac{1}{2c} |\nabla f(\mathbb x)|^2$. Then, we have $f(\mathbf{y}) \ge f(\mathbb x) - \frac{1}{2c} |\nabla f(\mathbb x)|^2$. Let $\mathbb y = \mathbb x^*$, we can have (a). Q.E.D.

This means when the gradient value is smaller, the current function value is getting closer to the optimal value.

(b)

follows from the (iii) by Cauchy-Schwartz inequality

(c)

Define a function $\phi_\mathbb x (\mathbb z) = f(\mathbb z) - \langle \nabla f(\mathbb x), \mathbb z \rangle$. FInd the minimizer of this function as

$\nabla \phi_\mathbb x(\mathbb z) = \nabla f(\mathbb z) - \nabla f(\mathbb x) = 0 \Longrightarrow \mathbb z^* = x$. Then, we can use the (a), have $\frac{1}{2 c}|\nabla f(\mathrm{y}) - \nabla f(\mathrm{x})|^{2} \geq f(\mathrm{y}) - \langle\nabla f(\mathbb x), \mathbb x - \mathbb y \rangle - f\left(\mathrm{x}\right)$ .

(d)

interchanging $\mathbb x$ and $\mathbb y$ in (c), we get
$$
f(\mathrm{x}) \leq f(\mathrm{y})+\langle\nabla f(\mathrm{y}), \mathrm{x}-\mathrm{y}\rangle+\frac{1}{2 c}|\nabla f(\mathrm{x})-\nabla f(\mathrm{y})|^{2}
$$
Combining the two inequalities, we can get (d). Q.E.D.

3.3 Unconstrained Programming

3.3.1 Unconstrained Convex Programming

For a convex programming, a local minimizer is also a global minimizer.

Suppose $\mathbb x^*$ is a stationary pointm i.e. $\nabla f(\mathbb x^*)=0$. For any $\mathbb y\in D$, we have $f(\mathbb y) \ge f(\mathbb x^*)+\nabla f(\mathbb x^*)^T(\mathbb y - \mathbb x^*) = f(\mathbb x^*)$. Hence, $\mathbb x^*$ us a global minimizer.

3.3.2 Unconstrained Convex Quadratic Programming

Let $\mathbf Q$ be an $n\times n$ symmetric matrix and $\mathbf c \in \mathbb R^n$. The quadratic function
$$
q(\mathbb{x})=\frac{1}{2} \mathbb{x}^T \mathbf{Q} \mathbb{x} + \mathbb{c}^T \mathbb{x}
$$

is a convex function $\iff$ $\mathbf{Q}$ is positive semidefinite.

***global minimizer $\mathbb{x}^$ $\iff$ $\mathbf{Q}\mathbb{x}^=-\mathbb{c}$. if $\mathbf{Q}^{-1}$ exists, $\mathbb{x}^=-\mathbf{Q}^{-1}\mathbb{c}$.

==Exercise:== How to prove it?

Show that
$$
\inf_{\mathbb x\in \mathbb R^n} \left{ \frac{1}{2}\langle\mathbb x, \mathbf Q\mathbb x \rangle - \langle \mathbf c, \mathbb x \rangle \right} =
\begin{cases}
-\infty \quad & \text{if } c\notin \operatorname{Range}(\mathbf Q) \
\frac{1}{2} \langle \mathbb w, \mathbf Q \mathbb w \rangle \quad & \text{if } \mathbb c=\mathbf Q \mathbb w
\end{cases}
$$
The latter is achieved for any $\mathbb x$ s.t.
$$
\mathbf Q (\mathbb x - \mathbb w) = 0 \iff \mathbb x \in \mathbb w + \operatorname{Null}(\mathbf Q)
$$

3.4 Constrained Convex Programming

3.4.1 Minimizer

Theorem 3.7.

Let $f : C → R$ be a convex and continuously differentiable function on the convex set $C ⊂ \mathcal{E}$. Consider the constrained minimization problem
$$
\min { f(x) \vert c\in C}
$$
Then, $x^\in C$ is a global minimizer $\iff$
$$
\langle \triangledown f(x^), x-x^* \rangle > 0, \forall x \in C
$$
Using Normal cone to rephrase it,

$x^*$ is a global minimizer of $\min { f(x) \vert c\in C}$ $\iff$ $-\triangledown f(x^*) \in N_C (x^*)$

3.4.2 Projection

Theorem 3.8. (Projection Theorem)

Let $C$ be a closed convex set in $\mathcal{E}$.

(a) For every $z ∈ \mathcal{E}$, there exists a unique minimizer (denoted as $Π_C(z)$ and called as the projection of $z$ onto $C$) of
$$
\min { \frac{1}{2} |x-z|^2 \vert x\in C }
$$
(b) $x^* := \Pi_C (z)$ is projection of $z$ onto $C$ iff
$$
\langle z-x^*, x-x^* \rangle \leq 0, \forall x\in C
$$
(c) [firmly non-expensive property] For any $z, w \in \mathcal{E}$
$$
| \Pi_C(z) - \Pi_C(w) |^2 \leq \langle z-w, \Pi_C(z) - \Pi_C(w) \rangle
$$
Hence $| \Pi_C(z) - \Pi_C(w) |^2 \leq |z-w |$, i.e. $\Pi_C(\cdot)$ is Lipschitz continuous with modulus 1.

Proof.

(a).

Let $\bar x \in C$. set $S = {x\in C| f(x)\le f(\bar x) }$. Obviously, the set $S$ is closed and bounded. By using Weierstrass’s Theorem, there exists $x^* \in S$ such that $f(x^*) \le f(x), \quad \forall x\in S$. If $\bar x \in S$, then we can also have $f(x^*)\le f(\bar x)$.

Besides, we can also have
$$
f(x)>f(\bar x)\ge f(x^*), \quad \forall x\in C\backslash S
$$
Thus we have shown that $f(x^*)\le f(x)$ for all $x\in S \cup (C\backslash S) = C$.

Since the function $f(x)= \frac{1}{2} |x-z|^2$ is strictly convex on $C$, the minimizer is unique.

(b).

Based on Theorem 3.7, we have that
$$
0\le \langle f(x^*), x-x^* \rangle = \langle x^*-z, x-x^* \rangle \quad \forall x\in C
$$
(c).

Based on (b), $w,z \in \mathcal E$, we hanve
$$
\langle z-\Pi_C(z), \Pi_C(w)-\Pi_C(z)\rangle \le 0 \
\langle w-\Pi_C(w), \Pi_C(z)-\Pi_C(w)\rangle \le 0 \
$$
superposition the two inequalities, we can have (c)

Remark.

If $C$ is linear subspace of $\mathbb R^n$, then $z-x^*\perp C$. This means
$$
z = \Pi_C(z) + (z-\Pi_C(z)), \quad \text{and } \quad \langle z-\Pi_C(z), \Pi_C(z)\rangle = 0
$$
More generally, If $C$ is a closed convex cone, we have
$$
\langle z-\Pi_C(z), \Pi_C(z)\rangle = 0
$$

Example:

$C = \mathbb R_+^n$. For any $z\in \mathbb R^n$, $\Pi_C(z)=\max {z,0}$.

Proof.

$\Pi_C(z)$ is minimizer of $\min { \frac{1}{2} |x-z|^2 | x\in \mathbb R_+^n }$. We have
$$
|x-z|^2=(x_1^2-2z_1x_1 + z_1^2) + \cdots + (x_n^2 -2z_nx_n + z_n^2)
$$
Each part should be minimized. So, we have $\Pi_C(z)=\max {z,0}$.

$C={ x\in \mathbb R^n | \sum_{i=1}^n x_i = 1, x \ge 0 }$ is a simplex. Suppose $z\in \mathbb R^n$ satisfies the condition that $z_1 \ge z_2 \ge \cdots \ge z_n$. Let $k$ be the maximum index in the set ${ 1\leq j \leq n | z_j + \frac{1}{j} (1-\sum_{i=1}^jz_i)>0 }$. Set
$$
\lambda = z_k + \frac{1}{k}(1-\sum_{i=1}^kz_i)
$$
Then the projection onto the simplex $C$ is given by
$$
\Pi_C(z) = \max {z+\lambda e, 0} \quad (e \text{ is the vector of all ones})
$$
Proof.

In the later Chaper. Also in Assignment 1.

$C=\mathbb S_+^n$, the cone of $n\times n$ symmetric positive semidefinite matrices. For a given $A\in \mathbb S^n$, let the eigenvalue decomposition of $A$ be $A=U \operatorname{diag}(d) U^T$. Then
$$
\Pi_C(A) = U \operatorname{diag}(\max{d,0}) U^T
$$
Proof.

Note that $U$ is orthogonal. $|UY|_F = |YU|_F$ for any $Y\in \mathbb S^n$.
$$
\begin{aligned}
\min\left{ \frac{1}{2} |X-A|F^2 \vert X\in \mathbb S+^n \right} &= \min\left{ \frac{1}{2} |U(U^TXU-\operatorname{diag}(d))U^T|F^2 \vert X\in \mathbb S+^n \right}\
&= \min\left{ \frac{1}{2} |\bar X-\operatorname{diag}(d)|F^2 \vert \bar X\in \mathbb S+^n \right} (\text{ where} \bar X = U^TXU)\
&= \min\left{ \frac{1}{2} |\operatorname{diag}(x)-\operatorname{diag}(d)|_F^2 \vert x\ge0 \right} (\text{ off-diagonal elements of optimal} \bar X \text{ should be 0, so } \bar X= \operatorname{diag}(x))\
&= \min\left{ \frac{1}{2} |x-d|_F^2 \vert x\ge0 \right}\
\end{aligned}
$$
Then, we have $x^* = \max{d,0}$

Optimal $\bar X^* = \operatorname{diag}(x^*) = \operatorname{diag}(\max{d,0})$. Hence optimal $X^*=U\bar X^* U^T = U\operatorname{diag}(\max{d,0}) U^T$.

3.4.3 Frechet differentiable

Let $\mathcal E_1,\mathcal E_2$ be two finite-dimensional real Euclidean spaces. A function $f : \mathcal E_1 → \mathcal E_2$ is said to be Frechet differentiable at $x ∈ \mathcal E_1$ if there exists a linear map $f′(x): \mathcal E_1 →\mathcal E_2$ such that for any $h→0$,
$$
f(x + h) − f(x) − f′(x)[h] = o(|h|).
$$
This means
$$
\lim_{h\rightarrow 0 } \frac{f(x+h)-f(x)}{h} = f’(x)\
\iff \lim_{h\rightarrow 0 } \frac{f(x+h)-f(x) - f’(x)h}{h} = 0\
\iff f(x + h) − f(x) − f′(x)[h] = o(|h|)
$$
Proposition

Let $C$ be a nonempty closed convex set in $\mathcal E$. For any $x$, let $\theta (x) = \frac{1}{2} | x-\Pi_C(x)|^2$. The $\theta(\cdot)$ is a continuously differentiable convex function and
$$
\nabla \theta(x) = x-\Pi_C(x)
$$

3.5 Convex Separation

Definition :

affine hull: $\operatorname{aff}(S)$
$$
\operatorname{aff}(S) = \left{ \sum_{i=1}^k \alpha_ix_i \vert \sum_{i=1}^k \alpha_i=1, x_i\in S, \alpha_i\in \mathbb R, k>0 \right}
$$
convex hull: $\operatorname{conv} (S)$:
$$
\operatorname{conv}(S)=\left{\sum_{i=1}^{k} \alpha_{i} x_{i} | \sum_{i=1}^{k} \alpha_{i}=1, x_{i} \in S, \alpha_{i} \geq 0, k>0\right}
$$
relative interior point: $\operatorname{ri}(S)$:
$$
\operatorname{ri} S=\left{x \in \operatorname{aff} S | \exists \varepsilon>0, B_{\varepsilon}(x) \cap \operatorname{aff} S \subset S\right}
$$

Example:

The affine hull of a singleton (a set made of one single element) is the singleton itself.

The affine hull of a set of two different points is the line through them.

The affine hull of a set of three points not on one line is the plane going through them.

The affine hull of a set of four points not in a plane in $\mathbb R^3$ is the entire space $\mathbb R^3$.

$\operatorname{aff}(\mathbb R_+^n) = \mathbb R^n$.

Proposition:

$C$ Is a subset of Euclidean space $\mathcal E$. If $C$ is convex, then
$$
\operatorname{aff}(\mathrm{ri} C)=\operatorname{aff} C=\operatorname{aff}(\mathrm{cl}(C))
$$
where $\operatorname{cl}(C)$ us the closure.

Any point outside a closed convex set can be separated from a set with a hyperplane.

Proposition:

Let $\Omega\in \mathbb R^n$ be a nonempty closed convex set and let $\bar x \notin \Omega$. Then exists a nonzero vector $v\in \Omega $ such that
$$
\sup { \langle v,x \rangle | x\in\Omega } < \langle v, \bar x \rangle
$$

In fact, $v=\bar x - \Pi_\Omega(\bar x)$.

Proposition:

Let $Ω_1$ and $Ω_2$ be nonempty, closed, convex subsets of $\mathbb R^n$ with $Ω_1 ∩ Ω_2 = ∅$. If $Ω_1$ or $Ω_2$ is bounded, then there is a nonzero element $v ∈ \mathbb R^n$ such that
$$
\sup { \langle v,x \rangle | x\in\Omega_1 } < \inf { \langle v,y \rangle | y\in\Omega_2 }
$$
Separation property in subspace of $\mathbb R^n$, instead of in $\mathbb R^n$.

$L$ is subspace of $\mathbb R^n$ and let $\Omega \subset L$ be nonempty convex set with $\bar x\in L$ and $\bar x \notin \bar \Omega$ (the closure of $\Omega$). Then there exists a nonzero $v\in L$ such that
$$
\sup { \langle v,x \rangle | x\in \Omega} < \langle v, \bar x \rangle
$$
Definition (properly separated):

We say that two nonempty convex sets $\Omega_1$ and $\Omega_2$ can be properly separated if there exists a nonzero vector $v \in \mathbb R^n$ such that
$$
\begin{array}{rcl}
\sup \left{ \langle v, x\rangle | x \in \Omega_{1} \right} &\leq& \ \inf \left{ \langle v, y\rangle | y \in \Omega_{2}\right} \
\inf \left{ \langle v, x\rangle | x \in \Omega_{1}\right } & < & \sup \left{ \langle v, y\rangle | y \in \Omega_{2}\right}.
\end{array}
$$
Proposition:

$\Omega$ is a nonempty convex set. Then $0\notin \operatorname{ri}(\Omega)$ $\iff$ the set $\Omega$ and ${0}$ can be properly separated, i.e., there exists a nonzero $v\in \mathbb R^n$ such that
$$
\begin{array}{l}{\qquad \begin{array}{c}{\sup \left{\langle v, x\rangle | x \in \Omega \right} \leq 0 } \
{\inf \left{\langle v, x\rangle | x \in \Omega \right}<0}\end{array}}\end{array}
$$
Theorem:

Let $\Omega_1$ and $\Omega_2$ are two nonempty convex subset of $\mathbb R^n$. Then $\Omega_1$ and $\Omega_2$ can be properly separated $\iff$
$$
\operatorname{ri}(\Omega_1) \cap \operatorname{ri}(\Omega_2) = \empty
$$

Definition (Affine independence):

Elements $v_0, \cdots, v_m \in \mathbb R^n, m\ge 1$ are affinely independent, if
$$
\sum_{i=1}^m \lambda_i v_i = 0,\quad \sum_{i=1}^m \lambda_i = 0 \Longrightarrow \lambda_i = 0, \forall i=0, \cdots,m
$$

Like linear independence but without the restriction that the subset of lower dimension the points lies in contains the origin. So, three points are affinely independent if the smallest flat thing containing them is a plane. They are affinely dependent if they lie on a line (or are the same point)

Proposition

Set $\Omega$ is affine $\iff$ $\Omega$ contains all affine combinations of its elements.
$\Omega_1, \Omega_2$ are affine set, $\Longrightarrow$ $\Omega_1 \times \Omega_2$ is affine set.
Sum and scalar product of an affine set is still affine
Set $\Omega$ is linear subspace of $\mathbb R^n$ $\iff$ $\Omega$ is an affine set containing the origin.

3.6 Cones

Consider $p : \mathcal E → (−∞, ∞]$.
(a) $p$ is said to be a proper function if $p(x)$ is not identically equal to $∞$.

(b) $p$ is said to be closed if its epi-graph $\operatorname{epi}(p) := {(α, x) ∈ \mathbb R×\mathcal E | p(x) < α}$ is a closed subset of $\mathbb R×\mathcal E$.

[Linearity space] :

$C$ is a closed convex cone. The linearity space of $C$ is the set
$$
\operatorname{lin}(C) = C \cap (-C)
$$
It’s the largest linear subspace of $\mathcal E$ contained in $C$.

[Dual and polar cone] :

Dual cone : $S^* = { y\in S | \langle x,y \rangle \ge 0, \forall x\in S }$

Polar cone: $S^\circ = -S^*$

if $C^*=C$, the it’s self-dual.

Proposition:

If $C$ be a cone in $\mathcal E$.

(a). $C^*$ is a closed convex cone.

(b). If $C$ is a nonempty closed convex cone, then $(C^*)^* = C$

Example.

If $C = \mathbb R_+^n$. Then

$\operatorname{aff} (C) = \mathbb R^n$

$\operatorname{lin} (C) = {0}$

$C^* = C$

3.7 Normal Cones

Normal cone of $C$ at $bar x \in C$ is
$$
N_C(\bar x) = {z\in \mathcal E | \langle z, x-\bar x \rangle \leq 0, \forall x\in C}
$$

Proposition:

$C$ is convex set and $\bar x \in C$.

$N_C(\bar x)$ is a closed and convex cone.
If $\bar x\in \operatorname{int}(C)$, then $N_C(\bar x) = {0}$
If $C$ is also a cone, then $N_C(\bar x) \subset C^\circ$.

Proposition:

$C\subset \mathbb R^n$ is closed convex set. For any $u,y\in C$,
$$
u\in N_C(y) \iff y=\Pi_C(y+u)
$$
Proposition:

Let $C_1, C_2 \subset \mathbb R^n$ be convex sets such that $\bar x \in C_1 \cap C_2$.

$N_{C_1} (\bar x) + N_{C_2} (\bar x) \subset N_{C_1\cap C_2} (\bar x)$
If the relative interior condition holds $\operatorname{ri}(C_1) \cap \operatorname{ri}(C_2) \neq \empty$, then we have
$$
N_{c_!\cap C_2} (\bar x) \subset N_{C_1}(\bar x) + N_{C_2}(\bar x)
$$

3.8 Subgradient

Definition:

a vector $v$ is a subgradient of $f$ at $x\in \operatorname{dom}(f)$ if
$$
f(z) \ge f(x)+\langle v,z-x \rangle, \quad \forall z\in \mathcal E
$$
denoted as $\part f(x)$. If $x\notin \operatorname{dom}(f)$, $\part f(x) = \empty$.

Proposition:

$\part f$ is a monotone mapping $\longrightarrow$
$$
\langle v-u, y-x \rangle \ge 0, \quad \forall u\in \part f(x), v\in \part f(y)
$$

example:

if $f$ is differentiable at $x$, then $\part f(x) = {\nabla f(x)}$

$C$ is a convex subset of $\mathbb R^n$, then
$$
\part \delta_C(x) = \begin{cases}
\empty \quad & \text{if }x\notin C \
\mathcal N_C(x)= \text{normal cone of $C$ at $x$} \quad & \text{if }x\in C
\end{cases}
$$

Let $f(x) = | x |_1$ for $x \in \mathbb R^n$. Then, $\part f(0) = { y\in \mathbb R^n | |y|_\infty \le 1 }$

Proof.

“$\Longrightarrow$”:

Assume $|y|_\infty \le 1$. We need to prove
$$
f(z)\ge f(0) + \langle y, z-0 \rangle, \forall z\in \mathbb R^n \iff |z|_1 \ge \langle y,z \rangle, \forall z\in \mathbb R^n
$$

$$
\langle y,z \rangle = y_1z_1 + \cdots + y_nz_n \le |y_1z_1| + \cdots + |y_nz_n| \le |y_1||z_1| + \cdots + |y_n||z_n| \le |z_1|+\cdots+|z_n| = |z|_1.
$$

“$\Longleftarrow$”:

Assume $|z|_1 \ge \langle y,z \rangle, \forall z\in \mathbb R^n$. We need to prove $|y|_\infty \le 1$. Prove it by contradiction.

Assume $|y|_\infty >1$. Let $z$ be a set, the $i$th element be $\max(y)$ where $i$ is the index of the max element. Then we have the contradiction.

Definition:

Let $f:\mathcal E→(−∞,∞]$ be an extended real-valued function. Define epigraph and effective domain as
$$
\begin{aligned} \mathrm{epi} f &={(x, \mu) \in \mathcal{E} \times \mathbb{R} | f(x) \leq \mu} \ \mathrm{dom} f &={x \in \mathcal{E} | f(x)<\infty} \end{aligned}
$$

$f$ is said to be convex (closed) if $\mathrm{epi} f$ is convex (closed).
$f$ is said to be proper if $\mathrm{dom} f$ is nonempty.
$f$ is said to be positively homogeneous if $f(λx) = λf(x)$ for all $x ∈ \mathcal E$ and $λ > 0$.
Any norm function on $\mathcal E$ is positively homogenous.

Definition: (Lipschitz)

Theorem (Rademarcher’s theorem)

Let $\mathcal O$ be a open subset of $\mathcal E_1$ and $F:\mathcal O→\mathcal E_2$ is locally Lipschitz on $\mathcal O$, then $F$ is almost everywhere Fréchet differentiable on $\mathcal O$.

Proposition:

3.9 Fenchel conjugate

Fenchel conjugate:
$$
f^*(y) = \sup { \langle y,x \rangle - f(x) | x\in \mathcal E }, y\in \mathcal \ E
$$
$f^$ is always closed and convex, even if $f$ is non-convex or not closed.*

Proposition:

Let $f$ be a closed proper convex function on $\mathcal E$. For any $x\in \mathcal E$, we have the following equivalent conditions for a verctor $x^* \in \mathcal E$:

$f(x) + f^*(x^*) = \langle x, x^* \rangle$
$x^* \in \part f(x)$
$x \in \part f^*(x^*)$
$\langle x, x^* \rangle - f(x) =\max_{z\in \mathcal E} {\langle z, x^* \rangle - f(z)}$
$\langle x, x^* \rangle - f^*(x^*) =\max_{z^\in \mathcal E} {\langle x, z^ \rangle - f^*(z^*)}$

Remark:

$C$ convex set. $\delta_C^*(x) = \sup { \langle x,y \rangle - \delta_C(y) | y\in \mathcal E } = \sup { \langle x,y \rangle | y\in C }$.
if $f:\mathcal E \rightarrow (-\infty, \infty]$ is a proper closed convex function, then $(f^*)^* = f$.
if $f:\mathcal E \rightarrow (-\infty, \infty]$ is a proper closed convex function, positively homogeneous and $f(0)=0$, then $f^* = \delta_C$, where $C=\part f(0)$.

Example.

$f(x) = |x|#$ is any norm function defined on $\mathcal E$ and $| \cdot|$ is the dual norm of $| \cdot |#$, i.e., for any $x\in \mathcal S, |x| = \sup_{y\in \mathcal E} {\langle x, y \rangle | |y|# \le 1 }$. Since $f$ is a postively homogeneous closed convex function, $f^*=\delta_C$ where
$$
C:= \part f(0) = B* := {x\in \mathcal E | |x|_*\le 1 }.
$$

$f:\mathbb R^n \rightarrow \mathbb R$ is defined by $f(x)=\max_{i=1,\cdots,n} x_i$. Then
$$
S:= \part f(0) = { x\in \mathbb R^n | \sum_{i=1}^n x_i = 1, x\ge 0 }.
$$
since $f$ is positively homogeneous and convex, $f^* = \delta_S$. It is known that the projection of $x$ onto $S$ admits a fast algorithm with computational complexity $O(n)$.

$g(x) = \max_{i=1,\cdots, n}x_i$ and $f(x)=g(x)+\delta_{\mathbb R_+^n}(x)$. $f(\cdot)$ is a positively homogeneous convex function. Then
$$
\partial f(0)=\partial g(0)+\partial \delta_{\mathbb{R}{+}^{n}}(0)= S-\mathbb{R}{+}^{n}=\left{x \in \mathbb{R}^{n} | e^{T} x^{+} \leq 1\right}
$$
where $S:= \part f(0) = { x\in \mathbb R^n | \sum_{i=1}^n x_i = 1, x\ge 0 }$ and $x_{i}^{+}:=\max \left(x_{i}, 0\right)$. Let $x \in \mathbb{R}^{n}$ be given. Define $\ I_{+}=\left{i: x_{i} \geq 0\right}$, $I_{-}=\left{i | x_{i}<0\right}$, and $C=\left{\xi \in \mathbb{R}^{\left|I_{+}\right|} | e^{T} \xi \leq 1, \xi \geq 0\right}$. The projection $\bar{x}=\Pi_{\partial f(0)}(x)$ is given by
$$
\bar{x}{I{+}}=\Pi_{C}\left(x_{I_{+}}\right)\
\bar{x}{I{-}}=x_{I_{-}}
$$

3.10 Semismoothness

Definition (semismoothness)

semismooth:

strongly semismooth

Theorem.

Any convex function is semismooth

Piecewise smooth functions are semismooth functions.

3.11 Moreau-Yosida regularization and proximal

mapping

Definition:

$f: \mathcal E \rightarrow (-\infty, \infty]$ is a closed proper convex function.
$$
\begin{array}{ll}{\text { MY regularization of } f \text { at } x:} & {M_{f}(x)=\min {y \in \mathcal{E}}\left{\phi(y ; x):=f(y)+\frac{1}{2}|y-x|^{2}\right}} \ {\text { Proximal mapping } f \text { at } x:} & {P{f}(x)=\operatorname{argmin}{y \in \mathcal{E}}{\phi(y ; x):=f(y)+\frac{1}{2}|y-x|^{2}}}\end{array}
$$
We also have
$$
\begin{array}{ll} {M{\lambda f}(x)=\min {y \in \mathcal{E}}\left{\phi(y ; x):=f(y)+\frac{1}{2\lambda}|y-x|^{2}\right}} \ {P{\lambda f}(x)=\operatorname{argmin}_{y \in \mathcal{E}}{\phi(y ; x):=f(y)+\frac{1}{2\lambda}|y-x|^{2}}}\end{array}
$$
Proposition

$P_f(x)$ exists and it unique.
$M_f(x)\le f(x)$ for all $x\in \mathcal E$.

Description:

MY regularization is also referred to as the Moreau envelope of $f$ with parameter $λ$.
The Moreau envelope $M_f$ is essentially a smoothed or regularized form of $f$, it has domain $\mathbb R^n$ even when $f$ does not.
It is continuously differentiable, even when $f$ is not.
The sets of minimizer of $f$ and $M_f$ are the same. The problems of minimizing $f$ and $M_f$ are thus equivalent, and the latter is always a smooth optimization problem ($M_f$ may be difficult to evaluate).
$$
\mathrm {argmin} {f(x)|x\in \mathcal E} = \mathrm{argmin} {M_f(x)|x\in\mathcal E}
$$
To see why $M_f$ is a smoothed form of $f$, we have
$$
M_f = (f^* + \frac{1}{2} |\cdot|_2^2)^*
$$
Because we have $M_f^{**} = M_f$, it can be interpreted as obtaining a smooth approximation to a function by taking its conjugate, adding regulation and then taking the conjugate again. With no regularization, it would simply give the original function. With the quadratic regularization, it gives a smooth approximation.

Basic Operations:

Theorem (Moreau decomposition)

Let $f$ be closed proper convex function and $f^*$ be its conjugate. Then
$$
\begin{aligned}
x&=P_f(x)+P_{f^*}(x), \forall x\in \mathcal E \
\frac{1}{2} |x|^2 &= M_f(x) + M_{f^*}(x)
\end{aligned}
$$
For any $t>0$,
$$
\begin{aligned}
x&=P_{tf}(x)+tP_{tf^*}(\frac{x}{t}), \forall x\in \mathcal E \
\frac{1}{2} |x|^2 &= M_{tf}(x) + tM_{tf^*}(\frac{x}{t})
\end{aligned}
$$

Examples:

$f(x)=\lambda |x|_1$, $f^*(z)=\delta_C(z)$ where $C=\part f(0)={z\in \mathbb R^n | |z|_\infty \le \lambda }$

$C$ is closed and convex. $f=\delta_C$,
$$
P_{\delta_{C}}(x)=\operatorname{argmin}{y \in \mathcal{E}}\left{\delta{C}(y)+\frac{1}{2}|y-x|\right}=\operatorname{argmin}{y \in C} \frac{1}{2}|y-x|^{2}=\Pi{C}(x)
$$
Suppose $C=\mathbb S_+^n$, the cone of $n\times n$ symmetric PSD matrices. Then
$$
\Pi_C(x)=Q\mathrm{diag}d_+ Q^T \quad \text{using spectral decomposition } x=Q \mathrm{diag}dQ^T
$$

Theorem:

Proximal mapping is nonexpansive, like projection.

$P_f$ and $Q_f:=I-P_f$ are firmly nonexpansive, i.e.,
$$
|P_f(x)-P_f(y)|^2 \le \langle P_f(x)-P_f(y), x-y \rangle, \quad \forall x,y\in \mathcal E \
|Q_f(x)-Q_f(y)|^2 \le \langle Q_f(x)-Q_f(y), x-y \rangle, \quad \forall x,y\in \mathcal E
$$
Hence $|P_f(x)-P_f(y)| \le |x-y|$. That is, $P_f$ and $Q_f$ are lipstchitz continuous with modulus 1.
$M_f$ is continuously differentiable and $\nabla M_f(x) = x-P_f(x), \forall x\in \mathcal E$.
$\nabla M_{tf}(x)=tP_{f^*/t}(x/t)$ and $\nabla M_{f^*/t}(x)=t^{-1}P_{tf}(tx)$

Proposition:

$\part P_f(x)$ has the following properties:

any $V\in \part P_f(x)$ is self-adjoint.
$\langle Vd,d \rangle \ge |Vd|^2$ for any $V\in \part P_f(x)$ and $d\in \mathcal E$.

3.12 Directional Derivatives of matrix-valued function