Penalty methods#

In contrast to barrier methods, penalty methods solve a sequence of unconstrained optimization problems whose solution is usually infeasible to the original constrained problem. As this penalty is increased, the iterates are forced towards the feasible region.

Consider the equality-constrained problem

\[\begin{split} \begin{array}{ll} \textrm{minimize} & f(x) \\ \textrm{subject to} & h(x) = 0, \\ \end{array} \end{split}\]

where \(h(x)\) is an \(p\)-dimensional vector, whose \(j\)-th component \(h_{j}(x)\) is twice continuously differentiable.

The best-known penalty is the quadratic-loss function

\[ \psi(x) := \frac{1}{2} \sum_{j = 1}^{p} h_{j}(x)^{2} = \frac{1}{2} h(x)^{T}h(x). \]

The weight of the penalty is controlled by a positive penalty parameter \(\rho\). The penalty method consists of solving a sequence of unconstrained minimization problems of the form

\[ \min\limits_{x} \pi(x,\rho^{k}) = f(x) + \rho^{k}\psi(x) \]

for an increasing sequence \(\{ \rho^{k} \}\) of positive values tending to infinity.

Penalty methods share many of the properties of barrier methods. Under appropriate conditions, the sequence of penalty function minimizers defines a continuous trajectory. In the latter case, it is possible to get estimates of the Lagrange multipliers at the solution.

Consider the quadratic-loss penalty function

\[ \pi(x,\rho) = f(x) + \frac{1}{2} \rho \sum_{j = 1}^{p} h_{j}(x)^{2}. \]

Its minimizer \(x(\rho)\) satisfies

\[ \nabla_{x} \pi(x(\rho),\rho) = \nabla f(x(\rho)) + \rho \sum_{j = 1}^{p} \nabla h_{j}(x(\rho)) h_{j}(x(\rho)) = 0. \]

Defining \(\lambda_{j}(\rho) := -\rho \cdot h_{j}(x(\rho))\) one obtains

\[ \nabla f(x(\rho)) - \sum_{j = 1}^{p} \lambda_{j}(\rho) \nabla h_{j}(x(\rho)) = 0. \]

If \(x(\rho)\) converges to a solution \(x^{*}\) that is a regular point of the constraints, then \(\lambda(\rho)\) converges to the Lagrange multiplier \(\lambda^{*}\) associated with \(x^{*}\).

Penalty functions suffer from the same problems of ill-conditioning as barrier functions. As the penalty parameter increases, the condition number of the Hessian matrix of \(\pi(x(\rho), \rho)\) increases, tending to \(\infty\) as \(\rho \to \infty\). Therefore the unconstrained minimization problems can become increasingly difficult to solve.

PM: Example 1#

Consider the problem

\[\begin{split} \begin{array}{lll} \textrm{minimize} & f(x) =& -x_{1}x_{2} \\ \textrm{subject to} & h(x) =& x_{1} + 2 x_{2} - 4 = 0. \end{array} \end{split}\]

Then the penalty method solves a sequence of unconstrained minimization problems

\[ \min\limits_{x} \pi(x,\rho^{k}) = -x_{1}x_{2} + \frac{1}{2} \rho^{k} (x_{1} + 2 x_{2} - 4)^{2} \]

for increasing values \(\rho^{k}\). The necessary conditions for optimality are

\[\begin{split} \begin{aligned} -x_{2} + \rho(x_{1} + 2 x_{2} - 4) &= 0 \\ -x_{1} + 2\rho(x_{1} + 2 x_{2} - 4) &= 0. \end{aligned} \end{split}\]

For \(\rho > 1/4\) this yields the solution

\[\begin{split} \begin{aligned} x_{1}(\rho) &= \frac{8 \rho}{4 \rho - 1}, \\ x_{2}(\rho) &= \frac{4 \rho}{4 \rho - 1}, \end{aligned} \end{split}\]

which is a local as well as a global minimizer. (The unconstrained problem has no minimum if \(\rho \leq 1/4\).) Note that \(x(\rho)\) is infeasible to the original constrained problem, since

\[ h(x(\rho)) = x_{1} + 2x_{2} - 4 = \frac{16 \rho}{4 \rho - 1} - 4 = \frac{4 \rho}{4 \rho - 1}. \]

At any solution \(x(\rho)\) we can define a Lagrange multiplier estimate

\[ \lambda = -\rho \cdot h(x(\rho)) = \frac{-4 \rho}{4 \rho - 1}. \]

As \(\rho\) tends to \(\infty\) one obtains

\[ \lim_{\rho \to \infty} x_{1}(\rho) = \lim_{\rho \to \infty} \frac{2}{1 - 1/4\rho} = 2, \qquad \lim_{\rho \to \infty} x_{2}(\rho) = \lim_{\rho \to \infty} \frac{1}{1 - 1/4\rho} = 1, \]

and indeed \(x^{*} = (2, 1)^{T}\) is the minimizer for the constrained problem. Further,

\[ \lim_{\rho \to \infty} \lambda(\rho) = \lim_{\rho \to \infty} \frac{-1}{1 - 1/4\rho} = -1, \]

and indeed \(\lambda^{*} = -1\) is the Lagrange multiplier at \(x^{*}\).

The ill-conditioning of the penalty function is demonstrated by the Hessian matrix at \(x(\rho)\):

\[\begin{split} \nabla_{x}^{2} \pi(x(\rho),\rho) = \begin{pmatrix} \rho & 2\rho - 1 \\ 2\rho - 1 & 4\rho \end{pmatrix}. \end{split}\]

It can be shown that its condition number is approximately \(25 \rho / 4\). When \(\rho\) is large, the Hessian matrix is ill-conditioned.

It is also possible to apply penalty methods to problems with inequality constraints. Consider for example the problem

\[\begin{split} \begin{array}{ll} \textrm{minimize} & f(x) \\ \textrm{subject to} & g_{i}(x) \geq 0, \quad i = 1, \ldots, m. \\ \end{array} \end{split}\]

For example, the quadratic-loss penalty in this case is

\[ \psi(x) := \frac{1}{2} \sum_{i = 1}^{m} \left[ \min (g_{i}(x), 0) \right]^{2}. \]

This function has continuous first derivatives

\[ \nabla \psi(x) = \sum_{i = 1}^{m} \left[ \min (g_{i}(x), 0) \right] \nabla g_{i}(x), \]

but its second derivatives can be discontinuous at points where some constraint \(g_{i}\) is satisfied exactly. Hence, a careful implementation is necessary. One cannot safely use Newton’s method to minimize the function. For this reason, straightforward penalty methods have not been widely used for solving general inequality-constrained problems.

For a convergence discussion of penalty methods, see [GNS08] (chapter 16.2.3).