Constrained minimization algorithms#

Some of the algorithms and techniques used to find local minima of unconstrained optimization problems can be applied mutatis mutandis for the constrained problem case.

Sequential Quadratic Programming (SQP)#

For the sake of simplicity the equality constrained minimization problem is considered:

\[\begin{split} \begin{array}{lll} \textrm{minimize} & f(x) & \\ \textrm{subject to} & h_{j}(x) = 0, & j = 1, \ldots, p, \\ \end{array} \end{split}\]

with optimization variable \(x \in \mathbb{R}^{n}\) and twice continuously differentiable objective and constraint functions \(f, h_{j} \in C^{2}(\mathbb{R}^{n}, \mathbb{R})\), \(h := (h_{1}, \ldots, h_{p})^{T} \in C^{2}(\mathbb{R}^{n}, \mathbb{R}^{p})\).

The corresponding Lagrangian is:

\[ L(x, \mu) := f(x) + \sum_{j = 1}^{p} \mu_{j} h_{j}(x), \]

and KKT optimality conditions:

\[\begin{split} \begin{aligned} \nabla_{x} L(x^{*}, \mu^{*}) &=\; \nabla f(x^{*}) + \sum_{j = 1}^{p} \mu_{j}^{*} \nabla h_{j}(x^{*}) &=\; 0 & \\ \nabla_{\mu} L(x^{*}, \mu^{*}) &=\; ( h_{j}(x^{*}) ) = h(x^{*}) &=\; 0 &, \quad j = 1, \ldots, p \end{aligned} \end{split}\]

Again the goal is to perform an iteration

\[ \mathbf{x}^{k+1} = \mathbf{x}^{k} + \alpha \mathbf{d}^{k} \]

with \(\mathbf{x} = (x, \mu)^{T}\), which converges from a starting point \(\mathbf{x}^{0}\) to a local minimum \(\mathbf{x}^{*}\) (a stationary point of the KKT optimality conditions) choosing a descent direction \(\mathbf{d} = (\Delta x, \Delta \mu)^{T}\) and a step size \(\alpha\) in each step (cf. Gradient methods).

Using Newton’s method (quadratic Taylor-approximation of the Lagrangian \(L(x, \mu)\)) to determine the descent direction, results in solving the following system of non-linear equations:

\[ \nabla^{2} L(x, \mu) \mathbf{d} = -\nabla L(x, \mu) \]

\[\begin{split} \begin{pmatrix} \nabla^{2}_{x,x} L(x, \mu) & \nabla^{2}_{x,\mu} L(x, \mu) \\ \nabla^{2}_{\mu,x} L(x, \mu) & \nabla^{2}_{\mu,\mu} L(x, \mu) \end{pmatrix} \begin{pmatrix} \Delta x \\ \Delta \mu \end{pmatrix} = -\begin{pmatrix} \nabla_{x} L(x, \mu) \\ \nabla_{\mu} L(x, \mu) \end{pmatrix} \end{split}\]

This Hessian matrix has a certain structure:

\[\begin{split} \begin{pmatrix} Q & B^{T} \\ B & 0 \end{pmatrix} \begin{pmatrix} \Delta x \\ \Delta \mu \end{pmatrix} = -\begin{pmatrix} \nabla f(x) + B^{T} \mu \\ h(x) \end{pmatrix} = -\begin{pmatrix} \nabla_{x} L(x, \mu) \\ h(x) \end{pmatrix}, \end{split}\]

where:

\(Q := \nabla^{2}_{x,x} L(x, \mu) = \nabla^{2} f(x) + \sum_{j = 1}^{p} \mu_{j} \nabla^{2} h_{j}(x)\).
\(B := \nabla^{2}_{\mu,x} L(x, \mu) = \) \(\begin{pmatrix} \nabla^{T} h_{1}(x) \\ \vdots \\ \nabla^{T} h_{p}(x) \end{pmatrix}\) is the Jacobian matrix of \(h(x)\).
\(B^{T} = \nabla^{2}_{x,\mu} L(x, \mu) = \begin{pmatrix} \nabla h_{1}(x) & \ldots & \nabla h_{p}(x) \end{pmatrix}\) is the transposed Jacobian matrix of \(h(x)\).
\(\nabla^{2}_{\mu,\mu} L(x, \mu) = 0\) is a zero matrix, as \(L(x, \mu)\) is linear in \(\mu\).

The derived Newton-equation happens to be identical to the KKT optimality conditions of the following quadratic optimization problem:

(2)#\[\begin{split} \begin{array}{lll} \textrm{minimize} & F(\Delta x) &:= \frac{1}{2} (\Delta x)^{T} Q \Delta x + \nabla_{x} L(x,\mu)^{T} \Delta x \\ \textrm{subject to} & H(\Delta x) &:= B \Delta x + h(x) = 0. \end{array} \end{split}\]

By choosing the Lagrange multipliers \(\lambda = \Delta \mu\) for the equality constraints, the corresponding Lagrangian is:

\[ \mathcal{L}(\Delta x,\lambda) = F(\Delta x) + H(\Delta x)^{T} \lambda \]

and the KKT optimality conditions:

\[\begin{split} \begin{aligned} \nabla_{(\Delta x)} \mathcal{L}(\Delta x,\lambda) &:=& Q \Delta x + \nabla_{x} L(x,\mu) + B^{T} \lambda &= 0, \\ H(\Delta x) = \nabla_{\lambda} \mathcal{L}(\Delta x,\lambda) &:=& B \Delta x + h(x) &= 0. \end{aligned} \end{split}\]

or written in matrix notation and regarding \(\lambda = \Delta \mu\):

\[\begin{split} \begin{pmatrix} Q & B^{T} \\ B & 0 \end{pmatrix} \begin{pmatrix} \Delta x \\ \Delta \mu \end{pmatrix} = -\begin{pmatrix} \nabla_{x} L(x,\mu) \\ h(x) \end{pmatrix}. \end{split}\]

Summarizing#

SQP methods solve a sequence of quadratic optimization sub-problems (quadratic objective function and linearized constraints).
In the unconstrained case, SQP reduces to Newton’s method for finding a stationary point \(\nabla f(x) = 0\).
In the only equality constrained case, the SQP method is equivalent to applying Newton’s method to the KKT optimality conditions.
Extensions for equality and inequality constraints exist, but are not discussed in this seminar.
Similar problems and limitations as discussed in the section about Newton’s method apply to the SQP method.
- Computing the Hessian matrix is expensive.
- If the Hessian matrix is not symmetric positive definite or the constraints are not linear independent (LICQ) during the iteration, convergence cannot be guaranteed.
See exercise RM04 for an application example.

Selected Topics in Mathematical Optimization

Constrained minimization algorithms

Contents

Constrained minimization algorithms#

Sequential Quadratic Programming (SQP)#

Summarizing#