Optimality conditions (NLP)

This page describes necessary and sufficient conditions for local optimality in (continuous) nonlinear programming (NLP) optimization formulations.

When solving a continuous nonlinear minimization problem:

minimize $f(\mathbf{x})$

subject to $\mathbf{g}(\mathbf{x})\leq \mathbf{0}$

$\mathbf{h}(\mathbf{x}) = \mathbf{0}$
$\mathbf{x}\subseteq\Re^{n}$

a global minimum is a point such that no other feasible point has a lower objective function value. A local minimum is a point such that all feasible points in the neighborhood immediately surrounding the point have higher objective function values.

Because local optimality depends only on the small neighborhood surrounding the point, it is possible to derive necessary and sufficient conditions for local optimality using a Taylor series expansion. NLP algorithms in general search for points that satisfy these conditions. If the NLP formulation is convex, then there is only one local minimum, which is also the global minimum. If the formulation is nonconvex, these conditions do not guarantee global optimality.

Unconstrained NLP

For an unconstrained problem:

minimize $f(\mathbf{x})$

a candidate point $\mathbf{x}_\dagger$ can be tested for local optimality by writing the Taylor series expansion about the point:

$f = f(\mathbf{x}_\dagger) + \bigg(\frac{\partial f}{\partial \mathbf{x}}\bigg)_\dagger\partial \mathbf{x} + \frac{1}{2} \partial \mathbf{x}^{T} \bigg(\frac{\partial ^{2} f}{\partial \mathbf{x}^{2}}\bigg)_\dagger \partial \mathbf{x} + ...$

where $\partial \mathbf{x} = (\mathbf{x} - \mathbf{x}_\dagger)$

If $\mathbf{x}_\dagger$ is a local minimum, then, by definition, it must be true that all points in the immediate neighborhood have a higher objective function value - i.e.: there exists an ε > 0 such that

$f(\mathbf{x}_\dagger + \partial \mathbf{x}) - f(\mathbf{x}_\dagger) > 0$

$\forall \partial \mathbf{x}:||\partial \mathbf{x}|| < \epsilon$

Necessary condition

For small $\partial\mathbf{x}$ (i.e.: for the region in the immediate neighborhood of the candidate point), the first order term of the Taylor series dominates, and higher order terms are negligible. Since the first order term alone cannot satisfy the condition of a local minimum, the linear term must be zero as a necessary condition (so that higher order terms will determine the nature of the point):

$\bigg(\frac{\partial f}{\partial \mathbf{x}}\bigg)_\dagger = \mathbf{0}$

which is also written as

$\nabla_{\mathbf{x}}f = \mathbf{0}$

Any point satisfying this necessary condition is called a stationary point.

Sufficient condition

If the necessary condition is satisfied, the second order term of the Taylor series will dominate for small $\partial\mathbf{x}$. If the second order term is positive for all (small) $\partial\mathbf{x}$, this is sufficient to determine a minimum:

$\partial\mathbf{x}^{T}\bigg(\frac{\partial^{2}f}{\partial\mathbf{x}^{2}}\bigg)_\dagger \partial\mathbf{x}> \mathbf{0}; \forall \partial \mathbf{x}$

This condition is equivalent to saying that $\frac{\partial^{2}f}{\partial\mathbf{x}^{2}}$ is positive definite at the point $\mathbf{x}_\dagger$, which can be tested by checking that all eigenvalues of the Hessian matrix at the candidate point are positive.

Common Mistake: While it is common to mention only the second order term in discussing sufficiency conditions for optimality, the Taylor series derivation implies other possible sufficiency conditions: If the second order term is zero for any particular direction $\partial\mathbf{x}$, then higher order terms dominate along those directions, and their effects must be examined. A common mistake is to believe that a point with both first and second derivatives of zero is necessarily an inflection point. This is not necessarily the case: For example, consider the function f(x) = x4, which has a minimum at zero where the first three derivatives are zero and the fourth order term dominates, creating the sufficiency condition.

Constrained NLP

If the NLP optimization problem has constraints,

minimize $f(\mathbf{x})$

subject to $\mathbf{g}(\mathbf{x})\leq \mathbf{0}$

$\mathbf{h}(\mathbf{x}) = \mathbf{0}$

the problem can be reformulated in Lagrangian form to derive analogous optimality conditions. These conditions are again derived from a Taylor series expansion in the reduced space, although the derivation is not provided here:

Necessary condition

The Karush-Kuhn-Tucker (KKT) conditions are the first-order necessary conditions for constrained problems (optimality conditions for unconstrained problems are a special case). They can be stated as:

$\nabla L(\mathbf{x_*}) = \nabla f(\mathbf{x_*}) + \mathbf{\lambda}^T \nabla \mathbf{h(x_*)} + \mathbf{\mu}^T \nabla \mathbf{g(x_*)} = \mathbf{0}^T$
$\mathbf{h(x_*})=0$
$\mathbf{g(x_*)} \leq 0$

where $\mathbf{x_*}$ is the solution satisfied KKT condition, and $\mathbf{\lambda}^T \neq 0 \mbox{ , } \mathbf{\mu}^T g=0 \mbox{ , } \mathbf{\mu} \geq 0$

Sufficient condition

The second-order sufficient condition for local optimality in unconstrained models is:

$\partial \mathbf{x}^T \mathbf{H^*} \partial \mathbf{x} > 0$

where the Hessian matrix $\mathbf{H^*} = \nabla^2 L(\mathbf{x_*})$. To satisfy the sufficient condition, the Hessian matrix needs to be positive definite on the subspace tangent to the active constraints. Because a Hessian matrix is symmetric, it is positive definite when all eigenvalues are positive. A point that satisfies these necessary and sufficient conditions is a local minimum.