Constrained minimization algorithms#

Some of the algorithms and techniques used to find local minima of unconstrained optimization problems can be applied mutatis mutandis for the constrained problem case.

Sequential Quadratic Programming (SQP)#

For the sake of simplicity the equality constrained minimization problem is considered:

\[\begin{split} \begin{array}{lll} \textrm{minimize} & f(x) & \\ \textrm{subject to} & h_{j}(x) = 0, & j = 1, \ldots, p, \\ \end{array} \end{split}\]

with optimization variable \(x \in \mathbb{R}^{n}\) and twice continuously differentiable objective and constraint functions \(f, h_{j} \in C^{2}(\mathbb{R}^{n}, \mathbb{R})\), \(h := (h_{1}, \ldots, h_{p})^{T} \in C^{2}(\mathbb{R}^{n}, \mathbb{R}^{p})\).

The corresponding Lagrangian is:

\[ L(x, \mu) := f(x) + \sum_{j = 1}^{p} \mu_{j} h_{j}(x), \]

and KKT optimality conditions:

\[\begin{split} \begin{aligned} \nabla_{x} L(x^{*}, \mu^{*}) &=\; \nabla f(x^{*}) + \sum_{j = 1}^{p} \mu_{j}^{*} \nabla h_{j}(x^{*}) &=\; 0 & \\ \nabla_{\mu} L(x^{*}, \mu^{*}) &=\; ( h_{j}(x^{*}) ) = h(x^{*}) &=\; 0 &, \quad j = 1, \ldots, p \end{aligned} \end{split}\]

Again the goal is to perform an iteration

\[ \mathbf{x}^{k+1} = \mathbf{x}^{k} + \alpha \mathbf{d}^{k} \]

with \(\mathbf{x} = (x, \mu)^{T}\), which converges from a starting point \(\mathbf{x}^{0}\) to a local minimum \(\mathbf{x}^{*}\) (a stationary point of the KKT optimality conditions) choosing a descent direction \(\mathbf{d} = (\Delta x, \Delta \mu)^{T}\) and a step size \(\alpha\) in each step (cf. Gradient methods).

Using Newton’s method (quadratic Taylor-approximation of the Lagrangian \(L(x, \mu)\)) to determine the descent direction, results in solving the following system of non-linear equations:

\[ \nabla^{2} L(x, \mu) \mathbf{d} = -\nabla L(x, \mu) \]
\[\begin{split} \begin{pmatrix} \nabla^{2}_{x,x} L(x, \mu) & \nabla^{2}_{x,\mu} L(x, \mu) \\ \nabla^{2}_{\mu,x} L(x, \mu) & \nabla^{2}_{\mu,\mu} L(x, \mu) \end{pmatrix} \begin{pmatrix} \Delta x \\ \Delta \mu \end{pmatrix} = -\begin{pmatrix} \nabla_{x} L(x, \mu) \\ \nabla_{\mu} L(x, \mu) \end{pmatrix} \end{split}\]

This Hessian matrix has a certain structure:

\[\begin{split} \begin{pmatrix} Q & B^{T} \\ B & 0 \end{pmatrix} \begin{pmatrix} \Delta x \\ \Delta \mu \end{pmatrix} = -\begin{pmatrix} \nabla f(x) + B^{T} \mu \\ h(x) \end{pmatrix} = -\begin{pmatrix} \nabla_{x} L(x, \mu) \\ h(x) \end{pmatrix}, \end{split}\]

where:

  • \(Q := \nabla^{2}_{x,x} L(x, \mu) = \nabla^{2} f(x) + \sum_{j = 1}^{p} \mu_{j} \nabla^{2} h_{j}(x)\).

  • \(B := \nabla^{2}_{\mu,x} L(x, \mu) = \) \(\begin{pmatrix} \nabla^{T} h_{1}(x) \\ \vdots \\ \nabla^{T} h_{p}(x) \end{pmatrix}\) is the Jacobian matrix of \(h(x)\).

  • \(B^{T} = \nabla^{2}_{x,\mu} L(x, \mu) = \begin{pmatrix} \nabla h_{1}(x) & \ldots & \nabla h_{p}(x) \end{pmatrix}\) is the transposed Jacobian matrix of \(h(x)\).

  • \(\nabla^{2}_{\mu,\mu} L(x, \mu) = 0\) is a zero matrix, as \(L(x, \mu)\) is linear in \(\mu\).

The derived Newton-equation happens to be identical to the KKT optimality conditions of the following quadratic optimization problem:

(2)#\[\begin{split} \begin{array}{lll} \textrm{minimize} & F(\Delta x) &:= \frac{1}{2} (\Delta x)^{T} Q \Delta x + \nabla_{x} L(x,\mu)^{T} \Delta x \\ \textrm{subject to} & H(\Delta x) &:= B \Delta x + h(x) = 0. \end{array} \end{split}\]

By choosing the Lagrange multipliers \(\lambda = \Delta \mu\) for the equality constraints, the corresponding Lagrangian is:

\[ \mathcal{L}(\Delta x,\lambda) = F(\Delta x) + H(\Delta x)^{T} \lambda \]

and the KKT optimality conditions:

\[\begin{split} \begin{aligned} \nabla_{(\Delta x)} \mathcal{L}(\Delta x,\lambda) &:=& Q \Delta x + \nabla_{x} L(x,\mu) + B^{T} \lambda &= 0, \\ H(\Delta x) = \nabla_{\lambda} \mathcal{L}(\Delta x,\lambda) &:=& B \Delta x + h(x) &= 0. \end{aligned} \end{split}\]

or written in matrix notation and regarding \(\lambda = \Delta \mu\):

\[\begin{split} \begin{pmatrix} Q & B^{T} \\ B & 0 \end{pmatrix} \begin{pmatrix} \Delta x \\ \Delta \mu \end{pmatrix} = -\begin{pmatrix} \nabla_{x} L(x,\mu) \\ h(x) \end{pmatrix}. \end{split}\]

Summarizing#

  • SQP methods solve a sequence of quadratic optimization sub-problems (quadratic objective function and linearized constraints).

  • In the unconstrained case, SQP reduces to Newton’s method for finding a stationary point \(\nabla f(x) = 0\).

  • In the only equality constrained case, the SQP method is equivalent to applying Newton’s method to the KKT optimality conditions.

  • Extensions for equality and inequality constraints exist, but are not discussed in this seminar.

  • Similar problems and limitations as discussed in the section about Newton’s method apply to the SQP method.

    • Computing the Hessian matrix is expensive.

    • If the Hessian matrix is not symmetric positive definite or the constraints are not linear independent (LICQ) during the iteration, convergence cannot be guaranteed.

  • See exercise RM04 for an application example.