Convex functions#

This section is devoted to convex sets and convex functions, which have useful properties for optimization algorithms.

Some special sets#

Lines and line segments#

Points \(y\) on a line through two points \(x_1 \neq x_2\) in \(\mathbb{R}^{n}\) can be expressed by

\[ y = \theta x_1 + (1 - \theta) x_2, \]

or

\[ y = x_2 + \theta (x_1 - x_2), \]

where \(\theta \in \mathbb{R}\).

For a line segment, \(\theta\) is limited to \(0 \leq \theta \leq 1\).

x1 = [0.0; 0.5];
x2 = [1.0; 0.0];
theta = linspace (-0.4, 1.4, 100);
y = @(t) x2 + t .* (x1 - x2);
yy = y(theta);
plot (yy(1,:), yy(2,:), 'k');
hold on;
tprops = {'FontSize', 18};
mprops = {'MarkerSize', 9, ...
          'MarkerFaceColor', 'black'};
for t = [1.2, 1, 0.6, 0, -0.2]
  yy = y(t);
  plot (yy(1), yy(2), 'kx', mprops{:});
  text (yy(1) - 0.22, yy(2) - 0.05, ...
        sprintf ('\\theta = %.1f', t), tprops{:});
end
plot (x1(1), x1(2), 'ko', mprops{:});
text (x1(1), x1(2) + 0.1, 'x1', tprops{:});
plot (x2(1), x2(2), 'ko', mprops{:});
text (x2(1), x2(2) + 0.1, 'x2', tprops{:});
axis equal;
axis off;
_images/nlo-lecture-05-convex-functions_2_0.png

Affine sets#

An affine set contains the line through any two distinct points in the set.

For example the solution set of linear equations \(\{x | Ax = b\}\) is affine.

Conversely, every affine set can be expressed as solution set of a system of linear equations.

Convex sets#

A convex set \(C\) contains the line segment through any two distinct points in the set.

\[ x_1, x_2 \in C, \quad 0 \leq \theta \leq 1 \quad\implies\quad \theta x_1 + (1 - \theta) x_2 \in C. \]

Examples (from [BV04] (p. 24): first set is convex, the latter two not)

convex_sets

Hyperplanes#

Affine and convex sets of the form \(\{ x \in \mathbb{R}^{n} \colon a^{T}x = b \}\) with normal vector \(a \neq 0\).

x1 = [0, 0.7];
x2 = [1, 0.0];
x0 = x1 + 0.5 .* (x2 - x1);
x = 0.25 .* (x2 - x1);
a = [0.15, 0.15 / x1(2)];
plot ([x1(1), x2(1)], [x1(2), x2(2)], 'k-');
hold on;
plot (x0(1), x0(2), 'ko');
quiver (x0(1), x0(2), a(1), a(2), 'k', 'LineWidth', 2);
quiver (x0(1), x0(2), x(1), x(2), 'k', 'LineWidth', 2);
tprops = {'FontSize', 18};
text (x0(1) - 0.04, x0(2) - 0.04, 'x_0', tprops{:});
text (x0(1) + a(1) + 0.02, x0(2) + a(2) + 0.02, 'a', tprops{:});
text (x0(1) + x(1) + 0.02, x0(2) + x(2) + 0.02, 'x', tprops{:});
text (x2(1) - 0.2, x2(2) + 0.02, 'a^{T}x = b', tprops{:});
axis equal;
axis off;
_images/nlo-lecture-05-convex-functions_6_0.png

Halfspaces#

Convex sets of the form \(\{ x \in \mathbb{R}^{n} \colon a^{T}x \leq b \}\) with normal vector \(a \neq 0\).

x1 = [0, 0.7];
x2 = [1, 0.0];
x0 = x1 + 0.5 .* (x2 - x1);
a = [0.15, 0.15 / x1(2)];
x3 = x1 - a;
x4 = x2 - a;
fill ([x1(1), x2(1), x4(1), x3(1)], [x1(2), x2(2), x4(2), x3(2)], ...
      [220, 220, 220] / 256, 'EdgeColor', 'none');
hold on;
plot ([x1(1), x2(1)], [x1(2), x2(2)], 'k-', 'LineWidth', 2);
plot (x0(1), x0(2), 'ko');
quiver (x0(1), x0(2), a(1), a(2), 'k', 'LineWidth', 2);
tprops = {'FontSize', 18};
text (x0(1) - 0.04, x0(2) - 0.04, 'x_0', tprops{:});
text (x0(1) + a(1) + 0.02, x0(2) + a(2) + 0.02, 'a', tprops{:});
text (x2(1) - 0.2, x2(2) + 0.2, 'a^{T}x \geq b', tprops{:});
text (x2(1) - 0.3, x2(2),       'a^{T}x \leq b', tprops{:});
axis equal;
axis off;
_images/nlo-lecture-05-convex-functions_8_0.png

Ellipsoid#

Convex sets of the form

\[ \{ x \in \mathbb{R}^{n} \colon (x - x_{c})^{T} P^{-1} (x - x_{c}) \leq 1 \}, \]

with center \(x_{c} \in \mathbb{R}^{n}\) and \(P \succ 0\) (symmetric positive definite) or

\[ \{ x_{c} + Au \in \mathbb{R}^{n} \colon \lVert u \rVert \leq 1 \}, \]

with \(A\) square and non-singular.

rx = 2;
ry = 0.7;
rz = 1;
[x, y, z] = ellipsoid (0, 0, 0, rx, ry, rz, 200);
colormap ('gray');
surf (x, y, z);
hold on;
plot3 ([0, rx], [0, 0], [1, 1], 'k-', 'LineWidth', 2);
plot3 ([0, 0], [0, ry], [1, 1], 'k-', 'LineWidth', 2);
plot3 (0, 0, 1, 'ko', 'MarkerSize', 9);
text  (-0.1, -0.1, 1, 'x_{c}', 'FontSize', 18);
shading flat;
view (0, 90);  % reduce to x-y-plane
xlim ([-rx, rx]);
ylim ([-1, 1]);
zlim ([-rz, rz]);
axis off;
_images/nlo-lecture-05-convex-functions_10_0.png

Polyhedra#

A Polyhedron is an intersection of a finite number of halfspaces and hyperplanes.

x1 = [ 0.0, 0.0];
x2 = [ 0.3, 0.6];
x3 = [-0.2, 0.9];
x4 = [-0.6, 0.5];
x5 = [-0.5, 0.2];
x = [x1; x2; x3; x4; x5];
fill (x(:,1), x(:,2), ...
      [220, 220, 220] / 256, ...
      'EdgeColor', 'none');
hold on;

tprops = {'FontSize', 18};
for i = 1:5
  j = mod (i, 5) + 1;
  p = @(t) x(i,:)' + t .* (x(j,:)' - x(i,:)');
  p0 = p(0.5);  % Start of normal vector
  p1 = p(-0.2);
  p2 = p(1.2);
  % Normal vector a
  d = 0.3;  % direction and scale
  if (x(i,1) < 0)
    d = -d;
  end
  a = (x(j,:)' - x(i,:)');
  a = [1; -a(1)/a(2)];
  a = a / norm (a) * d;
  plot ([p1(1), p2(1)], [p1(2), p2(2)], 'k-', 'LineWidth', 2);
  quiver (p0(1), p0(2), a(1), a(2), 'k', 'LineWidth', 2);
  t = p0 + 1.2 .* a;
  text (t(1), t(2), sprintf ('a_{%d}', i), tprops{:});
end

text (-0.2, 0.45, 'P', tprops{:});
axis equal;
axis off;
_images/nlo-lecture-05-convex-functions_12_0.png

The intersection of convex sets is convex.

Proof.

Let \(x_{1}, x_{2} \in C \cap D\), where \(C\) and \(D\) are convex sets. Then \(\theta x_{1} + (1 - \theta) x_{2} \in C \cap D\).

\[\tag*{$\Box$}\]

Solution set of finitely many linear inequalities and equalities is convex

\[ Ax \leq b, \qquad Cx = d, \]

where \(A \in \mathbb{R}^{m \times n}\), \(C \in \mathbb{R}^{p \times n}\), and \(\leq\) is component-wise inequality.

Convex functions#

Let \(X \subset \mathbb{R}^{n}\) be a convex set. A function \(f \colon \mathbb{R}^{n} \to \mathbb{R}\) is convex on \(X\), if

\[ f( \theta x + (1 - \theta) y ) \;\leq\; \theta f(x) + (1 - \theta)f(y) \]

for all \(x,y \in X\) and \(0 \leq \theta \leq 1\).

x = linspace (-0.7, 0.9, 50);
y = x.^2;
plot (x, y, 'k');
hold on;
plot ([-0.5, 0.7], [-0.5, 0.7].^2, 'ko-', 'LineWidth', 2);
text (-0.75, 0.2, '(x, f(x))', 'FontSize', 18);
text ( 0.75, 0.5, '(y, f(y))', 'FontSize', 18);
axis off;
_images/nlo-lecture-05-convex-functions_15_0.png
  • \(f\) is concave if \(-f\) is convex.

  • \(f\) is strictly convex if \(X \subset \mathbb{R}^{n}\) is a convex set and

    \[ f( \theta x + (1 - \theta) y ) \;<\; \theta f(x) + (1 - \theta)f(y), \]

    for all \(x,y \in X\), \(x \neq y\), and \(0 < \theta < 1\).

Examples on \(\mathbb{R}\)#

Convex:

  • affine: \(ax + b\) on \(\mathbb{R}\), for any \(a, b \in \mathbb{R}\)

  • exponential: \(e^{ax}\), for any \(a \in \mathbb{R}\)

  • powers: \(x^{\alpha}\) on \(\mathbb{R}_{++}\), for \(\alpha \geq 1\) or \(\alpha \leq 0\)

  • powers of absolute value: \(\lvert x \rvert^{p}\) on \(\mathbb{R}\), for \(p \geq 1\)

  • negative entropy: \(x\log(x)\) on \(\mathbb{R}_{++}\)

Concave:

  • affine: \(ax + b\) on \(\mathbb{R}\), for any \(a, b \in \mathbb{R}\)

  • powers: \(x^{\alpha}\) on \(\mathbb{R}_{++}\), for \(0 \leq \alpha \leq 1\)

  • logarithm: \(\log(x)\) on \(\mathbb{R}_{++}\)

Examples on \(\mathbb{R}^{n}\) (vectors)#

  • affine function \(f(x) = a^{T}x + b\) are convex and concave

  • norms are convex: \(\lVert x \rVert_{p} = (\sum_{i = 1}^{n} \lvert x_i \rvert^{p} )^{1/p}\) for \(p \geq 1\); \(\lVert x \rVert_{\infty} = \max(\lvert x_1 \rvert, \ldots, \lvert x_n \rvert)\)

Examples on \(\mathbb{R}^{m \times n}\) (\(m \times n\) matrices)#

  • affine function

    \[ f(X) = \operatorname{tr} \left( A^{T}X \right) + b = \left( \sum_{i = 1}^{m}\sum_{j = 1}^{n} A_{ij} X_{ij} \right) + b \]
  • spectral (maximum singular value) norm

    \[ f(X) = \lVert X \rVert_{2} = \sigma_{\max}(X) = \sqrt{\lambda_{\max}(X^{T}X)}. \]

Theorem 6: Unique global minimum

Let \(X \subset \mathbb{R}^{n}\) be a convex set and \(f \colon X \to \mathbb{R}\) a convex function, then each local minimum is a global minimum.

Furthermore, if \(f \in C^{1}\) (continuously differentiable) over an open and convex set \(X \subset \mathbb{R}^{n}\), then every stationary point is a global minimum of \(f\) over \(X\).

Proof:

Assume \(x^{*}\) is a local, but no global minimum. Then there is \(y \in X\) such that \(f(y) < f(x^{*})\). For \(0 < \theta \leq 1\) it follows

\[ f(x^{*} + \theta (y - x^{*})) \leq f(x^{*}) + \theta (f(y) - f(x^{*})) < f(x^{*}) + \theta (f(x^{*}) - f(x^{*})) = f(x^{*}) \]

Thus in every neighborhood of \(x^{*}\) there exist points with smaller objective function value than \(f(x^{*})\). Therefore \(x^{*}\) cannot be a local minimum, contradiction.

Regarding the second statement, for \(y \in X\) assume the helper function

\[ \Psi(\theta) := f(x^{*}) + \theta (f(y) - f(x^{*})) - f(x^{*} + \theta (y - x^{*})) \geq 0. \]

\(\Psi\) is continuously differentiable and for \(0 \leq \theta \leq 1\) not negative. Furthermore \(\Psi(0) = 0\). Therefore

\[ \Psi'(0) = (f(y) - f(x^{*})) - \nabla f(x^{*})^{T}(y - x^{*}) \geq 0. \]

If \(\nabla f(x^{*}) = 0\), then \(f(y) - f(x^{*}) \geq 0\) for all \(y \in X\). Thus \(x^{*}\) is a global minimum of \(f\) over \(X\).

\[\tag*{$\Box$}\]

1st order convex function condition#

Theorem 7: 1st order convex function condition

Let \(X \subset \mathbb{R}^{n}\) be a non-empty, convex set and \(f \in C^{1}(\mathbb{R}^{n}, \mathbb{R})\) is convex over \(X\), then

\[ f(y) \;\geq\; f(x) + \nabla f(x)^{T}(y - x) \]

for all \(x,y \in X\).

Proof:
  1. Assume \(f\) is convex over \(X\). For \(y \in X\), \(0 \leq \theta \leq 1\) the function

    \[ \Psi(\theta) := f(x) + \theta (f(y) - f(x)) - f(x + \theta (y - x)) \geq 0 \]

    is continuously differentiable and \(\Psi(0) = 0\). Therefore

    \[ \Psi'(0) = (f(y) - f(x)) - \nabla f(x)^{T}(y - x) \geq 0. \]
  2. Assume \(f(y) \geq f(x) + \nabla f(x)^{T}(y - x)\) holds. Using \(\bar{x} := x + \theta(y - x)\)

    \[ f(x) \geq f(\bar{x}) + \nabla f(\bar{x})^{T}(x - \bar{x}) \]
    \[ f(y) \geq f(\bar{x}) + \nabla f(\bar{x})^{T}(y - \bar{x}) \]

    Multiply the first inequality with \((1 - \theta)\), the second inequality with \(\theta\), and finally add both inequalities, then the assertion follows:

    \[ (1 - \theta) f(x) + \theta f(y) \geq f(\bar{x}) + 0 = f(x + \theta(y - x)). \]
\[\tag*{$\Box$}\]
x = linspace (-0.7, 0.9, 50);
y = x.^2;
plot (x, y, 'k');
hold on;
plot ([0.1, 0.9], 0.8 .* [0.1, 0.9] - 0.16, 'k-', 'LineWidth', 2);
plot (0.4, 0.4^2, 'ko-', 'LineWidth', 2);
text (-0.6, 0.5, 'f(y)', 'FontSize', 18);
text ( 0.4, 0.1, '(x, f(x))', 'FontSize', 18);
text ( 0.9, 0.5, 'f(x) + \nabla f(x)^{T}(y - x)', 'FontSize', 18);
xlim ([-0.7, 1.3]);
axis off;
_images/nlo-lecture-05-convex-functions_24_0.png

The first-order approximation of \(f\) is global underestimator.

Theorem 7 can be extended to strict convexity, which is omitted here.

2nd order convex function condition#

Theorem 8: 2nd order convex function condition

Let \(X \subset \mathbb{R}^{n}\) be a open, non-empty, convex set and \(f \in C^{2}(\mathbb{R}^{n}, \mathbb{R})\) is convex over \(X\), then

\[ \nabla^{2} f(x) \succeq 0 \quad \text{(positive semi-definite)} \]

for all \(x \in X\).

Proof:
  1. Assume \(f\) is convex over \(X\). According to Theorem 7 and Taylor’s Theorem for \(x \in X\), \(d \in \mathbb{R}^{n}\), and sufficiently small \(t > 0\) there is

    \[ 0 \leq f(x + td) - f(x) - t\nabla f(x)^{T}d = \frac{t^2}{2} d^{T}\nabla^{2}f(x)d + o(t^2). \]

    Division by \(\frac{t^2}{2}\) and taking the limit \(t \to 0\) results in \(d^{T}\nabla^{2}f(x)d \geq 0\), i.e. the Hessian matrix must be positive semi-definite. Note that \(X\) is an open set. Thus \(x + td \in X\) holds for all sufficiently small \(t > 0\).

  2. Assume \(\nabla^{2}f(x) \succeq 0\) for all \(x \in X\). According to Taylor’s Theorem there is for \(x,y \in X\):

    \[ f(y) = f(x) + \nabla f(x)^{T}(y - x) + \frac{1}{2}(y - x)^{T} \nabla^{2} f( x + \theta (y - x) ) (y - x) \]

    with \(0 < \theta < 1\) depending on \(x\) and \(y\). Because the Hessian matrix is positive semi-definite

    \[ f(y) \geq f(x) + \nabla f(x)^{T}(y - x) \]

    holds and therefore the convexity of \(f\) over \(X\) according to Theorem 7.

\[\tag*{$\Box$}\]

Again, the extension of Theorem 8 to strict convexity is omitted here.

Examples convex function condition#

  • Quadratic function: \(f(x) = \frac{1}{2}x^{T}Ax + b^{T}x + c\), where \(A\) is a symmetric \(n \times n\) matrix, \(x,b \in \mathbb{R}^{n}\), and \(c \in \mathbb{R}\).

    \[ \nabla f(x) = Ax + b \quad\text{and}\quad \nabla^{2} f(x) = A. \]

    If \(A \succeq 0\), then \(f\) is a convex function.

  • Least-squares objective function: \(f(x) = \lVert Ax - b \rVert_{2}^{2}\), with \(A \in \mathbb{R}^{m \times n}\), \(x \in \mathbb{R}^{n}\), and \(b \in \mathbb{R}^{m}\).

    \[ \nabla f(x) = 2A^{T}(Ax - b) \quad\text{and}\quad \nabla^{2} f(x) = 2A^{T}A \]

    is a convex function for any \(A\)!

  • Quadratic-over-linear: \(f(x,y) = x^2 / y\).

    \[\begin{split} \nabla^{2} f(x,y) = \frac{2}{y^3} \begin{pmatrix} y \\ -x \end{pmatrix} \begin{pmatrix} y \\ -x \end{pmatrix}^{T} \succeq 0 \end{split}\]

    is a convex function for \(y > 0\).

% Following equivalent to (x,y,z := x^2/y) for plotting.
[y, z] = meshgrid (0:.08:2);
x = sqrt(y.*z);
f = @(y, z) sqrt (x .* z);
mesh (f(y,z), y, z);
hold on;
mesh (-f(y,z), y, z);
grid on;
xlabel ('x');
ylabel ('y');
zlabel ('f(x,y)');
title ('f(x,y) = x^2 / y');
_images/nlo-lecture-05-convex-functions_32_0.png

Operations that preserve convexity#

  • Non-negative multiple: \(\alpha f\) is convex, if \(f\) is convex and \(\alpha \geq 0\).

  • Sum: \(f_1 + f_2\) is convex, if \(f_1\) and \(f_2\) are convex. This extends to infinite sums and integrals.

  • Composition with affine function: f(Ax + b) is convex, if \(f\) is convex.

  • Point-wise maximum: \(f(x) = \max\{ f_1(x), \ldots, f_m(x) \}\) is convex, if \(f_1, \ldots, f_m\) are convex.

More examples for convex functions:

  • (Any) norm of an affine function: \(f(x) = \lVert Ax + b \rVert\).

  • Log-barrier-function for expressing linear inequalities:

    \[ f(x) = -\sum_{i = 1}^{m} \log\left( b_i - a_i^{T}x \right), \quad \operatorname{dom}(f) = \left\{ x \colon a_i^{T}x < b_i, i = 1, \ldots, m \right\}. \]
  • Piecewise-linear-function: \(f(x) = \max_{i = 1, \ldots, m} \left( b_i - a_i^{T}x \right)\).

For many more examples, see [BV04] (chapters 2 and 3).