MATH-F-204 Analytical Mechanics II — Extremisation, constraints and Lagrange multipliers

\(\newcommand{\footnotename}{footnote}\) \(\def \LWRfootnote {1}\) \(\newcommand {\footnote }[2][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\newcommand {\footnotemark }[1][\LWRfootnote ]{{}^{\mathrm {#1}}}\) \(\let \LWRorighspace \hspace \) \(\renewcommand {\hspace }{\ifstar \LWRorighspace \LWRorighspace }\) \(\newcommand {\mathnormal }[1]{{#1}}\) \(\newcommand \ensuremath [1]{#1}\) \(\newcommand {\LWRframebox }[2][]{\fbox {#2}} \newcommand {\framebox }[1][]{\LWRframebox } \) \(\newcommand {\setlength }[2]{}\) \(\newcommand {\addtolength }[2]{}\) \(\newcommand {\setcounter }[2]{}\) \(\newcommand {\addtocounter }[2]{}\) \(\newcommand {\arabic }[1]{}\) \(\newcommand {\number }[1]{}\) \(\newcommand {\noalign }[1]{\text {#1}\notag \\}\) \(\newcommand {\cline }[1]{}\) \(\newcommand {\directlua }[1]{\text {(directlua)}}\) \(\newcommand {\luatexdirectlua }[1]{\text {(directlua)}}\) \(\newcommand {\protect }{}\) \(\def \LWRabsorbnumber #1 {}\) \(\def \LWRabsorbquotenumber "#1 {}\) \(\newcommand {\LWRabsorboption }[1][]{}\) \(\newcommand {\LWRabsorbtwooptions }[1][]{\LWRabsorboption }\) \(\def \mathchar {\ifnextchar "\LWRabsorbquotenumber \LWRabsorbnumber }\) \(\def \mathcode #1={\mathchar }\) \(\let \delcode \mathcode \) \(\let \delimiter \mathchar \) \(\def \oe {\unicode {x0153}}\) \(\def \OE {\unicode {x0152}}\) \(\def \ae {\unicode {x00E6}}\) \(\def \AE {\unicode {x00C6}}\) \(\def \aa {\unicode {x00E5}}\) \(\def \AA {\unicode {x00C5}}\) \(\def \o {\unicode {x00F8}}\) \(\def \O {\unicode {x00D8}}\) \(\def \l {\unicode {x0142}}\) \(\def \L {\unicode {x0141}}\) \(\def \ss {\unicode {x00DF}}\) \(\def \SS {\unicode {x1E9E}}\) \(\def \dag {\unicode {x2020}}\) \(\def \ddag {\unicode {x2021}}\) \(\def \P {\unicode {x00B6}}\) \(\def \copyright {\unicode {x00A9}}\) \(\def \pounds {\unicode {x00A3}}\) \(\let \LWRref \ref \) \(\renewcommand {\ref }{\ifstar \LWRref \LWRref }\) \( \newcommand {\multicolumn }[3]{#3}\) \(\require {textcomp}\) \(\def \LWRtensorindicesthreesub #1#2{{_{#2}}\LWRtensorindicesthree }\) \(\def \LWRtensorindicesthreesup #1#2{{^{#2}}\LWRtensorindicesthree }\) \(\newcommand {\LWRtensorindicesthreenotsup }{}\) \(\newcommand {\LWRtensorindicesthreenotsub }{ \ifnextchar ^ \LWRtensorindicesthreesup \LWRtensorindicesthreenotsup }\) \(\newcommand {\LWRtensorindicesthree }{ \ifnextchar _ \LWRtensorindicesthreesub \LWRtensorindicesthreenotsub }\) \(\newcommand {\LWRtensorindicestwo }{ \ifstar \LWRtensorindicesthree \LWRtensorindicesthree }\) \(\newcommand {\indices }[1]{\LWRtensorindicestwo #1}\) \(\newcommand {\LWRtensortwo }[3][]{{}\indices {#1}{#2}\indices {#3}}\) \(\newcommand {\tensor }{\ifstar \LWRtensortwo \LWRtensortwo }\) \(\newcommand {\LWRnuclidetwo }[2][]{{\vphantom {\mathrm {#2}}{}^{\LWRtensornucleonnumber }_{#1}\mathrm {#2}}}\) \(\newcommand {\nuclide }[1][]{\def \LWRtensornucleonnumber {#1}\LWRnuclidetwo }\) \(\newcommand {\intertext }[1]{\text {#1}\notag \\}\) \(\let \Hat \hat \) \(\let \Check \check \) \(\let \Tilde \tilde \) \(\let \Acute \acute \) \(\let \Grave \grave \) \(\let \Dot \dot \) \(\let \Ddot \ddot \) \(\let \Breve \breve \) \(\let \Bar \bar \) \(\let \Vec \vec \) \(\require {cancel}\)

Chapter 1 Extremisation, constraints and Lagrange multipliers

The purpose of this chapter is to re-derive in the finite-dimensional case standard results on constrained extremisation so as to motivate and clarify analogous results needed in the functional case. It is based on chapters II.1, II.2, II.5 of [8]. The discussion on constraints and regularity conditions is taken from chapter 1 of [11].

1.1 Unconstrained extremisation

The problem consists of finding the extrema (=stationary points) of a function \(F(x^1,\dots ,x^n)\) of \(n\) variables \(x^a\), \(a=1,\dots ,n\) in the interior of a domain. We write the variation of such a function under variations \(\delta x^a\) of the variables \(x^a\) as

\[ \label {eq:26} \boxed {\delta F=\delta x^a\frac {\partial F}{\partial x^a}}. \]

Here and throughout, we use the summation convention over repeated indices, \(\delta x^a\frac {\partial F}{\partial x^a}=\sum _{a=1}^n\delta x^a\frac {\partial F}{\partial x^a}\). Since for a collection of functions \(f_a(x^b)\),

\begin{equation} \label {eq:34} f_a\delta x^a=0, \forall \delta x^a\Longrightarrow f_a=0, \end{equation}

it follows that the extrema \(\bar x^a\) are determined by the vanishing of the gradient of \(F\),

\begin{equation} \label {eq:27} \boxed {\delta F=0\iff \frac {\partial F}{\partial x^a}|_{x=\bar x}=0}. \end{equation}

Recall further that the nature (maximum, minimum, saddle point) of each extremum is determined by the eigenvalues of the matrix

\begin{equation} \label {eq:28} \frac {\partial ^2 F}{\partial x^a\partial x^b}|_{x=\bar x}. \end{equation}

1.2 Constraints and regularity conditions

Very often, one is interested in a problem of extremisation, where the variables \(x^a\) are subjected to constraints, that is to say where \(r\) independent relations

\begin{equation} \label {eq:29} G_m(x)=0,\quad m=1,\dots r, \end{equation}

are supposed to hold. Standard regularity on the functions \(G_m\) are that the rank of the matrix \(\frac {\partial G_m}{\partial x^a}\) is \(r\) (locally, in a neighborhood of \(G_m=0\)). This means that one may choose locally a new system of coordinates such that \(G_m\) are \(r\) of the new coordinates, \(x^a\longleftrightarrow q^\alpha , G_m\), with \(\alpha =1,\dots n-r\).

Lemma 1. A function \(f\) that vanishes when the constraints hold can be written as a combination of the functions defining the constraints:
\(\seteqnumber{0}{1.}{4}\)
\begin{equation} \boxed {f|_{G_m=0}=0 \iff f=g^m G_m},\label {eq:53} \end{equation}

for some coefficients \(g^{m}(x)\).

That the condition is sufficient (\(\Longleftarrow \)) is obvious. That it is necessary (\(\Longrightarrow \)) is shown in the new coordinate system, where \(f(q^\alpha ,G_m=0)=0\) and thus

\begin{equation} \label {eq:30} f(q^\alpha ,G_m)=\int ^1_0d\tau \frac {d}{d\tau } f(q^\alpha ,\tau G_m)=G_m\int ^1_0d\tau \frac {\partial f}{\partial G_m}(q^\alpha ,\tau G_m), \end{equation}

so that \(g^m=\int ^1_0d\tau \frac {\partial f}{\partial G_m}(q^\alpha ,\tau G_m)\).

In the following, we will use the notation \(f\approx 0\) for a function that vanishes when the constraints hold, so that the lemma becomes \(f\approx 0\iff f=g^mG_m\).

Lemma 2. Suppose that the constraints hold and that one restricts oneself to variations \(\delta x^a\) tangent to the constraint surface,
\(\seteqnumber{0}{1.}{6}\)
\begin{equation} \label {eq:31} G_m=0, \quad \frac {\partial G_m}{\partial x^a}\delta x^a\approx 0. \end{equation}

It follows that
\(\seteqnumber{0}{1.}{7}\)
\begin{equation} \label {eq:32} \boxed {f_a\delta x^a\approx 0,\quad \forall \delta x^a\ {\rm satisfying}\ \eqref {eq:31} \iff f_a\approx \mu ^m \frac {\partial G_m}{\partial x^a}}, \end{equation}

for some \(\mu ^m(x^a)\).

NB: In this lemma, the use of \(\approx \) means that these equalities in (1.7) and (1.8) are understood as holding when \(G_m=0\).

That the condition is sufficient \((\Longleftarrow )\) is again direct when contracting the expression for \(f_a\) in the RHS of (1.8) with \(\delta x^a\) and using the second of (1.7).

That the condition is necessary (\(\Longrightarrow \)) follows by first substracting from the LHS of (1.8) the combination \(\lambda ^m\frac {\partial G_m}{\partial x^a}\delta x^a\approx 0\), with arbitrary \(\lambda ^m\), which gives

\begin{equation} \label {eq:35} (f_a-\lambda ^m\frac {\partial G_m}{\partial x^a})\delta x^a\approx 0. \end{equation}

Without loss of generality (by renaming if necessary some of the variables), we can assume that

\begin{equation} \label {eq:36} |\frac {\partial G_m}{\partial x^a}|\neq 0, {\rm for}\ a= n-r+1,\dots , n. \end{equation}

One can prove¹ that this implies that one may locally solve the constraints in terms of the \(r\) last coordinates,

\begin{equation} \label {eq:37} G_m=0\iff x^\Delta =X^\Delta (x^\alpha ),\quad \Delta =n-r+1,\dots , n,\quad \alpha =1,\dots , n-r. \end{equation}

In this case, one refers to the \(x^\Delta \) as dependent and to the \(x^\alpha \) as independent variables. This also implies that, locally, there exists an invertible matrix \(M^{m\Delta }\) such that

\begin{equation} \label {eq:38} M^{m\Delta }\frac {\partial G_{m'}}{\partial x^\Delta }\approx \delta ^m_{m'},\quad \frac {\partial G_m}{\partial x^\Delta }M^{m\Gamma }\approx \delta ^\Gamma _\Delta . \end{equation}

Indeed, according Lemma 1, there exists functions \(M^{\Gamma m}\) such that

\begin{equation} \label {eq:458} x^\Delta -X^\Delta =M^{\Delta m}G_m, \end{equation}

Conversely, when describing the surface by \(x^\Gamma -X^\Gamma \), there exists functions \(M_{m\Delta }\) such that

\begin{equation} \label {eq:459} G_m=M_{m\Gamma }(x^\Gamma -X^\Gamma ). \end{equation}

Substituting this in the previous equation and differentiating with respect to \(x^\Delta \), it follows that

\begin{equation} \label {eq:460} M^{\Delta m}M_{m\Gamma }\approx \delta ^{\Delta }_\Gamma . \end{equation}

The second of equation (1.12) then follows by differentiating (1.13) with respect to \(x^\Delta \),

\begin{equation} \label {eq:45} \delta ^\Delta _\Gamma =\frac {\partial M^{\Delta m}}{\partial x^\Gamma } G_m+M^{\Delta m}\frac {\partial G_m}{\partial x^\Gamma }, \end{equation}

and the first because the right and the left inverse are the same.

In terms of the various types of variables, (1.9) may be decomposed as

\begin{equation} \label {eq:39} (f_\Delta -\lambda ^m\frac {\partial G_m}{\partial x^\Delta })\delta x^\Delta +(f_\alpha -\lambda ^m\frac {\partial G_m}{\partial x^\alpha })\delta x^\alpha \approx 0. \end{equation}

Since the \(\lambda ^m\) are arbitrary, we are free to fix

\begin{equation} \lambda ^m=M^{m\Gamma }f_\Gamma :=\mu ^m \iff f_\Delta \approx \mu ^m \frac {\partial G_m}{\partial x^\Delta }, \label {eq:41} \end{equation}

so that the first term (1.17) vanishes. We then remain with

\begin{equation} \label {eq:40} (f_\alpha -\mu ^m\frac {\partial G_m}{\partial x^\alpha })\delta x^\alpha \approx 0. \end{equation}

Since the variables \(x^\alpha \) are unconstrained and the variations \(\delta x^\alpha \) independent, this implies as in (1.1) that

\begin{equation} \label {eq:42} f_\alpha \approx \mu ^m\frac {\partial G_m}{\partial x^\alpha }. \end{equation}

□

¹ The explicit proof, which is not needed here, goes under the name of “implicit function theorem”.

1.3 Constrained extremisation

Extremisation of a function \(F(x^a)\) under \(r\) constraints \(G_m=0\) can then be done in two equivalent ways:

Method 1: The first is to solve the constraints as in (1.11) and to extremise the function

\begin{equation} \boxed {F^G(x^\alpha )=F(x^\alpha ,X^\Delta (x^\beta ))}.\label {eq:33} \end{equation}

In terms of the unconstrained variables \(x^\alpha \), we are back to an unconstrained extremisation problem, whose extrema are determined by

\begin{equation} \label {eq:43} \boxed {\frac {\partial F^G}{\partial x^\alpha }=0}. \end{equation}

Indeed, on the one hand,

\begin{equation} \label {eq:46} \frac {\partial F^G}{\partial x^\alpha }=\frac {\partial F}{\partial x^\alpha }|_{x^\Delta =X^\Delta } +\frac {\partial F}{\partial x^\Gamma }|_{x^\Delta =X^\Delta }\frac {\partial X^\Gamma }{\partial x^\alpha }. \end{equation}

On the other hand, differentiating (1.13) with respect to \(x^\alpha \),

\begin{equation} \label {eq:47} \frac {\partial X^\Gamma }{\partial x^\alpha }=\frac {\partial (X^\Gamma -x^\Gamma )}{\partial x^\alpha }=-\frac {\partial M^{\Gamma m}}{\partial x^\alpha } G_m-M^{\Gamma m}\frac {\partial G_m}{\partial x^\alpha }, \end{equation}

so that, if \(G_m=0\), or equivalently, if one substitutes \(x^\Delta \) by \(X^\Delta \), (1.22) may be written as

\begin{equation} \label {eq:48} \frac {\partial F}{\partial x^\alpha }\approx \mu ^m\frac {\partial G_m}{\partial x^\alpha },\quad \mu ^m\approx \frac {\partial F}{\partial x^\Gamma }M^{\Gamma m}. \end{equation}

Furthermore, if \(G_m=0\), one may also write

\begin{equation} \label {eq:49} \frac {\partial F}{\partial x^\Delta }=\frac {\partial F}{\partial x^\Gamma }\delta ^\Gamma _\Delta \approx \frac {\partial F}{\partial x^\Gamma }M^{\Gamma m}\frac {\partial G_m}{\partial x^\Delta }\approx \mu ^m\frac {\partial G_m}{\partial x^\Delta }. \end{equation}

by using (1.16) and the definition of \(\mu ^m\) in (1.25).

Method 2: Instead of restricting oneself to the space of independent variables \(x^a\), one extends the space of variables \(x^a\) by additional variables \(\lambda ^m\), called Lagrange multipliers, and one considers the unconstrained extremisation of the function,

\begin{equation} \label {eq:50} \boxed {F^\lambda (x^a,\lambda ^m)=F(x^a)-\lambda ^m G_m, \quad \delta F^\lambda =0}. \end{equation}

By unconstrained extremisation, we understand that the variables \(x^a,\lambda ^m\) are considered as unconstrained at the outset, with independent variations \(\delta x^a,\delta \lambda ^m\).

That this gives the same extrema can be seen as follows:

\begin{equation} \label {eq:51} 0=\delta F^\lambda =(\frac {\partial F}{\partial x^a}-\lambda ^m\frac {\partial G_m}{\partial x^a})\delta x^a-\delta \lambda ^m G_m, \end{equation}

implies

\begin{equation} \label {eq:52} \frac {\partial F}{\partial x^a}=\lambda ^m\frac {\partial G_m}{\partial x^a},\quad G_m=0. \end{equation}

A variation of the second equation then implies that \(\frac {\partial G_m}{\partial x^a}\delta x^a=0\), while contraction of the first equation with \(\delta x^a\) gives us (1.9) of the proof of lemma 2. We can then continue as in the proof of this lemma to conclude that \(\frac {\partial F}{\partial x^a}\approx \mu ^m\frac {\partial G_m}{\partial x^a}\) for some \(\mu ^m\), and thus that both methods define the same extrema.