Chapter 1 Extremisation, constraints and Lagrange multipliers
The purpose of this chapter is to re-derive in the finite-dimensional case standard results on constrained extremisation so as to motivate and clarify analogous results needed in the functional case. It is based on chapters II.1, II.2, II.5 of [8]. The discussion on constraints and regularity conditions is taken from chapter 1 of [11].
1.1 Unconstrained extremisation
The problem consists of finding the extrema (=stationary points) of a function \(F(x^1,\dots ,x^n)\) of \(n\) variables \(x^a\), \(a=1,\dots ,n\) in the interior of a domain. We write the variation of such a function under variations \(\delta x^a\) of the variables \(x^a\) as
\[ \label {eq:26} \boxed {\delta F=\delta x^a\frac {\partial F}{\partial x^a}}. \]
Here and throughout, we use the summation convention over repeated indices, \(\delta x^a\frac {\partial F}{\partial x^a}=\sum _{a=1}^n\delta x^a\frac {\partial F}{\partial x^a}\). Since for a collection of functions \(f_a(x^b)\),
\(\seteqnumber{0}{1.}{0}\)\begin{equation} \label {eq:34} f_a\delta x^a=0, \forall \delta x^a\Longrightarrow f_a=0, \end{equation}
it follows that the extrema \(\bar x^a\) are determined by the vanishing of the gradient of \(F\),
\(\seteqnumber{0}{1.}{1}\)\begin{equation} \label {eq:27} \boxed {\delta F=0\iff \frac {\partial F}{\partial x^a}|_{x=\bar x}=0}. \end{equation}
Recall further that the nature (maximum, minimum, saddle point) of each extremum is determined by the eigenvalues of the matrix
\(\seteqnumber{0}{1.}{2}\)\begin{equation} \label {eq:28} \frac {\partial ^2 F}{\partial x^a\partial x^b}|_{x=\bar x}. \end{equation}
1.2 Constraints and regularity conditions
Very often, one is interested in a problem of extremisation, where the variables \(x^a\) are subjected to constraints, that is to say where \(r\) independent relations
\(\seteqnumber{0}{1.}{3}\)\begin{equation} \label {eq:29} G_m(x)=0,\quad m=1,\dots r, \end{equation}
are supposed to hold. Standard regularity on the functions \(G_m\) are that the rank of the matrix \(\frac {\partial G_m}{\partial x^a}\) is \(r\) (locally, in a neighborhood of \(G_m=0\)). This means that one may choose locally a new system of coordinates such that \(G_m\) are \(r\) of the new coordinates, \(x^a\longleftrightarrow q^\alpha , G_m\), with \(\alpha =1,\dots n-r\).
That the condition is sufficient (\(\Longleftarrow \)) is obvious. That it is necessary (\(\Longrightarrow \)) is shown in the new coordinate system, where \(f(q^\alpha ,G_m=0)=0\) and thus
\(\seteqnumber{0}{1.}{5}\)\begin{equation} \label {eq:30} f(q^\alpha ,G_m)=\int ^1_0d\tau \frac {d}{d\tau } f(q^\alpha ,\tau G_m)=G_m\int ^1_0d\tau \frac {\partial f}{\partial G_m}(q^\alpha ,\tau G_m), \end{equation}
so that \(g^m=\int ^1_0d\tau \frac {\partial f}{\partial G_m}(q^\alpha ,\tau G_m)\).
In the following, we will use the notation \(f\approx 0\) for a function that vanishes when the constraints hold, so that the lemma becomes \(f\approx 0\iff f=g^mG_m\).
-
Lemma 2. Suppose that the constraints hold and that one restricts oneself to variations \(\delta x^a\) tangent to the constraint surface,
\(\seteqnumber{0}{1.}{6}\)\begin{equation} \label {eq:31} G_m=0, \quad \frac {\partial G_m}{\partial x^a}\delta x^a\approx 0. \end{equation}
It follows that
\(\seteqnumber{0}{1.}{7}\)\begin{equation} \label {eq:32} \boxed {f_a\delta x^a\approx 0,\quad \forall \delta x^a\ {\rm satisfying}\ \eqref {eq:31} \iff f_a\approx \mu ^m \frac {\partial G_m}{\partial x^a}}, \end{equation}
for some \(\mu ^m(x^a)\).
NB: In this lemma, the use of \(\approx \) means that these equalities in (1.7) and (1.8) are understood as holding when \(G_m=0\).
That the condition is sufficient \((\Longleftarrow )\) is again direct when contracting the expression for \(f_a\) in the RHS of (1.8) with \(\delta x^a\) and using the second of (1.7).
That the condition is necessary (\(\Longrightarrow \)) follows by first substracting from the LHS of (1.8) the combination \(\lambda ^m\frac {\partial G_m}{\partial x^a}\delta x^a\approx 0\), with arbitrary \(\lambda ^m\), which gives
\(\seteqnumber{0}{1.}{8}\)\begin{equation} \label {eq:35} (f_a-\lambda ^m\frac {\partial G_m}{\partial x^a})\delta x^a\approx 0. \end{equation}
Without loss of generality (by renaming if necessary some of the variables), we can assume that
\(\seteqnumber{0}{1.}{9}\)\begin{equation} \label {eq:36} |\frac {\partial G_m}{\partial x^a}|\neq 0, {\rm for}\ a= n-r+1,\dots , n. \end{equation}
One can prove1 that this implies that one may locally solve the constraints in terms of the \(r\) last coordinates,
\(\seteqnumber{0}{1.}{10}\)\begin{equation} \label {eq:37} G_m=0\iff x^\Delta =X^\Delta (x^\alpha ),\quad \Delta =n-r+1,\dots , n,\quad \alpha =1,\dots , n-r. \end{equation}
In this case, one refers to the \(x^\Delta \) as dependent and to the \(x^\alpha \) as independent variables. This also implies that, locally, there exists an invertible matrix \(M^{m\Delta }\) such that
\(\seteqnumber{0}{1.}{11}\)\begin{equation} \label {eq:38} M^{m\Delta }\frac {\partial G_{m'}}{\partial x^\Delta }\approx \delta ^m_{m'},\quad \frac {\partial G_m}{\partial x^\Delta }M^{m\Gamma }\approx \delta ^\Gamma _\Delta . \end{equation}
Indeed, according Lemma 1, there exists functions \(M^{\Gamma m}\) such that
\(\seteqnumber{0}{1.}{12}\)\begin{equation} \label {eq:458} x^\Delta -X^\Delta =M^{\Delta m}G_m, \end{equation}
Conversely, when describing the surface by \(x^\Gamma -X^\Gamma \), there exists functions \(M_{m\Delta }\) such that
\(\seteqnumber{0}{1.}{13}\)\begin{equation} \label {eq:459} G_m=M_{m\Gamma }(x^\Gamma -X^\Gamma ). \end{equation}
Substituting this in the previous equation and differentiating with respect to \(x^\Delta \), it follows that
\(\seteqnumber{0}{1.}{14}\)\begin{equation} \label {eq:460} M^{\Delta m}M_{m\Gamma }\approx \delta ^{\Delta }_\Gamma . \end{equation}
The second of equation (1.12) then follows by differentiating (1.13) with respect to \(x^\Delta \),
\(\seteqnumber{0}{1.}{15}\)\begin{equation} \label {eq:45} \delta ^\Delta _\Gamma =\frac {\partial M^{\Delta m}}{\partial x^\Gamma } G_m+M^{\Delta m}\frac {\partial G_m}{\partial x^\Gamma }, \end{equation}
and the first because the right and the left inverse are the same.
In terms of the various types of variables, (1.9) may be decomposed as
\(\seteqnumber{0}{1.}{16}\)\begin{equation} \label {eq:39} (f_\Delta -\lambda ^m\frac {\partial G_m}{\partial x^\Delta })\delta x^\Delta +(f_\alpha -\lambda ^m\frac {\partial G_m}{\partial x^\alpha })\delta x^\alpha \approx 0. \end{equation}
Since the \(\lambda ^m\) are arbitrary, we are free to fix
\(\seteqnumber{0}{1.}{17}\)\begin{equation} \lambda ^m=M^{m\Gamma }f_\Gamma :=\mu ^m \iff f_\Delta \approx \mu ^m \frac {\partial G_m}{\partial x^\Delta }, \label {eq:41} \end{equation}
so that the first term (1.17) vanishes. We then remain with
\(\seteqnumber{0}{1.}{18}\)\begin{equation} \label {eq:40} (f_\alpha -\mu ^m\frac {\partial G_m}{\partial x^\alpha })\delta x^\alpha \approx 0. \end{equation}
Since the variables \(x^\alpha \) are unconstrained and the variations \(\delta x^\alpha \) independent, this implies as in (1.1) that
\(\seteqnumber{0}{1.}{19}\)\begin{equation} \label {eq:42} f_\alpha \approx \mu ^m\frac {\partial G_m}{\partial x^\alpha }. \end{equation}
□
1.3 Constrained extremisation
Extremisation of a function \(F(x^a)\) under \(r\) constraints \(G_m=0\) can then be done in two equivalent ways:
Method 1: The first is to solve the constraints as in (1.11) and to extremise the function
\(\seteqnumber{0}{1.}{20}\)\begin{equation} \boxed {F^G(x^\alpha )=F(x^\alpha ,X^\Delta (x^\beta ))}.\label {eq:33} \end{equation}
In terms of the unconstrained variables \(x^\alpha \), we are back to an unconstrained extremisation problem, whose extrema are determined by
\(\seteqnumber{0}{1.}{21}\)\begin{equation} \label {eq:43} \boxed {\frac {\partial F^G}{\partial x^\alpha }=0}. \end{equation}
Indeed, on the one hand,
\(\seteqnumber{0}{1.}{22}\)\begin{equation} \label {eq:46} \frac {\partial F^G}{\partial x^\alpha }=\frac {\partial F}{\partial x^\alpha }|_{x^\Delta =X^\Delta } +\frac {\partial F}{\partial x^\Gamma }|_{x^\Delta =X^\Delta }\frac {\partial X^\Gamma }{\partial x^\alpha }. \end{equation}
On the other hand, differentiating (1.13) with respect to \(x^\alpha \),
\(\seteqnumber{0}{1.}{23}\)\begin{equation} \label {eq:47} \frac {\partial X^\Gamma }{\partial x^\alpha }=\frac {\partial (X^\Gamma -x^\Gamma )}{\partial x^\alpha }=-\frac {\partial M^{\Gamma m}}{\partial x^\alpha } G_m-M^{\Gamma m}\frac {\partial G_m}{\partial x^\alpha }, \end{equation}
so that, if \(G_m=0\), or equivalently, if one substitutes \(x^\Delta \) by \(X^\Delta \), (1.22) may be written as
\(\seteqnumber{0}{1.}{24}\)\begin{equation} \label {eq:48} \frac {\partial F}{\partial x^\alpha }\approx \mu ^m\frac {\partial G_m}{\partial x^\alpha },\quad \mu ^m\approx \frac {\partial F}{\partial x^\Gamma }M^{\Gamma m}. \end{equation}
Furthermore, if \(G_m=0\), one may also write
\(\seteqnumber{0}{1.}{25}\)\begin{equation} \label {eq:49} \frac {\partial F}{\partial x^\Delta }=\frac {\partial F}{\partial x^\Gamma }\delta ^\Gamma _\Delta \approx \frac {\partial F}{\partial x^\Gamma }M^{\Gamma m}\frac {\partial G_m}{\partial x^\Delta }\approx \mu ^m\frac {\partial G_m}{\partial x^\Delta }. \end{equation}
by using (1.16) and the definition of \(\mu ^m\) in (1.25).
Method 2: Instead of restricting oneself to the space of independent variables \(x^a\), one extends the space of variables \(x^a\) by additional variables \(\lambda ^m\), called Lagrange multipliers, and one considers the unconstrained extremisation of the function,
\(\seteqnumber{0}{1.}{26}\)\begin{equation} \label {eq:50} \boxed {F^\lambda (x^a,\lambda ^m)=F(x^a)-\lambda ^m G_m, \quad \delta F^\lambda =0}. \end{equation}
By unconstrained extremisation, we understand that the variables \(x^a,\lambda ^m\) are considered as unconstrained at the outset, with independent variations \(\delta x^a,\delta \lambda ^m\).
That this gives the same extrema can be seen as follows:
\(\seteqnumber{0}{1.}{27}\)\begin{equation} \label {eq:51} 0=\delta F^\lambda =(\frac {\partial F}{\partial x^a}-\lambda ^m\frac {\partial G_m}{\partial x^a})\delta x^a-\delta \lambda ^m G_m, \end{equation}
implies
\(\seteqnumber{0}{1.}{28}\)\begin{equation} \label {eq:52} \frac {\partial F}{\partial x^a}=\lambda ^m\frac {\partial G_m}{\partial x^a},\quad G_m=0. \end{equation}
A variation of the second equation then implies that \(\frac {\partial G_m}{\partial x^a}\delta x^a=0\), while contraction of the first equation with \(\delta x^a\) gives us (1.9) of the proof of lemma 2. We can then continue as in the proof of this lemma to conclude that \(\frac {\partial F}{\partial x^a}\approx \mu ^m\frac {\partial G_m}{\partial x^a}\) for some \(\mu ^m\), and thus that both methods define the same extrema.