November 2019

We wish to consider a special type of optimization problem:

Find the maximum (or minimum) of a function $f(x,y)$ subject to the condition $g(x,y)=0\quad\quad(1)$

If it is possible to solve $g(x)=0$ for $y$ so that it is expressed explicitly as $y=\psi(x)$ , by substituting $y$ in (1), it becomes

Find the maximum (or minimum) of a single variable function $f(x, \psi(x))$ .

In the case that $y$ can not be obtained from solving $g(x,y)=0$ , we re-state the problem as:

Find the maximum (or minimum) of a single variable function $z=f(x,y)$ where $y$ is a function of $x$ , implicitly defined by $g(x, y)=0\quad\quad\quad(2)$

Following the traditional procedure of finding the maximum (or minimum) of a single variable function, we differentiate $z$ with respect to $x$ :

$\frac{dz}{dx} = f_x(x,y) + f_y(x,y)\cdot \frac{dy}{dx}\quad\quad\quad(3)$

Similarly,

$g_x(x,y) + g_y(x,y)\cdot \frac{dy}{dx}=0\quad\quad\quad(4)$

By grouping $g(x, y)=0$ and (3), we have

$\begin{cases} \frac{dz}{dx}= f_x(x, y)+f_y(x, y)\cdot \frac{dy}{dx}\\ g(x,y) = 0\end{cases}\quad\quad\quad(5)$

The fact that $\frac{dz}{dx}= 0$ at any stationary point $x^*$ means for all $(x^*, y^*)$ where $g(x^*, y^*)=0$ ,

$\begin{cases} f_x(x^*, y^*)+f_y(x^*, y^*)\cdot \frac{dy}{dx}\vert_{x=x^*}=0 \\ g(x^*,y^*) = 0\end{cases}\quad\quad\quad(6)$

If $g_y(x^*,y^*) \ne 0$ then from (4),

$\frac{dy}{dx}\vert_{x=x^*} = \frac{-g_x(x^*, y^*)}{g_y(x^*, y^*)}$

Substitute it into (6),

$\begin{cases} f_x(x^*, y^*)+f_y(x^*, y^*)\cdot (\frac{-g_x(x^*, y^*)}{g_y(x^*, y^*)})=f_x(x^*, y^*)+g_x(x^*, y^*)\cdot (\frac{-f_y(x^*, y^*)}{g_y(x^*, y^*)})\\ g(x^*,y^*) = 0\end{cases}\quad\quad\quad(7)$

Let $\lambda = \frac{-f_y(x^*, y^*)}{g_y(x^*, y^*)}$ , we have

$f_y(x^*, y^*) + \lambda g_y(x^*, y^*) =0\quad\quad\quad(8)$

Combining (7) and (8) gives

$\begin{cases} f_x(x^*, y^*)+\lambda g_x(x^*, y^*) = 0 \\ f_y(x^*, y^*)+\lambda g_y(x^*, y^*)=0 \\ g(x^*, y^*) = 0\end{cases}$

It follows that to find the stionary points of $z$ , we solve

$\begin{cases} f_x(x, y)+\lambda g_x(x, y) = 0 \\ f_y(x, y)+\lambda g_y(x, y)=0 \\ g(x, y) = 0\end{cases}\quad\quad\quad(9)$

for $x, y$ and $\lambda$ .

This is known as the method of Lagrange’s multiplier.

Let $F(x,y,\lambda) = f(x,y) + \lambda g(x,y)$ .

Since

$F_x(x,y,\lambda) = f_x(x,y) + \lambda g_x(x,y)$ ,

$F_y(x,y,\lambda)=f_y(x,y) + \lambda g_y(x,y)$ ,

$F_{\lambda}(x,y,\lambda) = g(x, y)$ ,

(9) is equivalent to

$\begin{cases} F_x(x, y, \lambda)=0 \\ F_y(x,y,\lambda)=0 \\ F_{\lambda}(x, y) = 0\end{cases}\quad\quad\quad(10)$

Let’s look at some examples.

Example-1 Find the minimum of $f(x, y) = x^2+y^2$ subject to the condition that $x+y=4$

Let $F(x, y, \lambda) = x^2+y^2+\lambda(x+y-4)$ .

Solving

$\begin{cases}F_x=2x-\lambda=0 \\ F_y = 2y-\lambda = 0 \\ F_{\lambda} = x+y-4=0\end{cases}$

for $x, y, \lambda$ gives $x=y=2, \lambda=4$ .

When $x=2, y=2, x^2+y^2=2^2+2^2=8$ .

$\forall (x, y) \ne (2, 2), x+y=4$ , we have

$(x-2)^2 + (y-2)^2 > 0$ .

That is,

$x^2-4x+4 + y^2-4y+4 = x^2+y^2-4(x+y)+8 \overset{x+y=4}{=} x^2+y^2-16+8>0$ .

Hence,

$x^2+y^2>8, (x,y) \ne (2,2)$ .

The target function $x^2+y^2$ with constraint $x+y=4$ indeed attains its global minimum at $(x, y) = (2, 2)$ .

I first encountered this problem during junior high school and solved it:

$(x-y)^2 \ge 0 \implies x^2+y^2 \ge 2xy\quad\quad\quad(11)$

$x+y=4\implies (x+y)^2=16\implies x^2+2xy +y^2=16$

$\implies 2xy=16-(x^2+y^2)\overset{(11)}{\implies} x^2+y^2 \ge 16-(x^2+y^2)$

$\implies 2\cdot(x^2+y^2) \ge 16 \implies x^2+y^2 \ge 8\implies z_{min} = 8$ .

I solved it again in high school when quadratic equation is discussed:

$x+y=4 \implies y =4-x$

$z=x^2+y^2 \implies z = x^2+(4-x)^2 \implies 2x^2-8x+16-z=0$

$\Delta = 64-4 \cdot 2\cdot (16-z) \ge 0 \implies z \ge 8\implies z_{min} = 8$

In my freshman calculus class, I solved it yet again:

$x+y=4 \implies y=4-x$

$z = x^2+(4-x)^2$

$\frac{dz}{dx} = 2x+2(4-x)(-1)=2x-8+2x=4x-8$

$\frac{dz}{dx} =0 \implies x=2$

$\frac{d^2 z}{dx^2} = 4 > 0 \implies x=2, z_{min}=2^2+(4-2)^2=8$

Example-2 Find the shortest distance from the point $(1,0)$ to the parabola $y^2=4x$ .

We minimize $f = (x-1)^2+y^2$ where $y^2=4x$ .

If we eliminate $y^2$ in $f$ , then $f = (x-1)^2+4x$ . Solving $\frac{df}{dx} = 2x+2=0$ gives $x=-1$ , Clearly, this is not valid for it would suggest that $y^2=-4$ from $y^2=4x$ , an absurdity.

By Lagrange’s method, we solve

$\begin{cases} 2(x-1)-4\lambda=0 \\2y\lambda+2y = 0 \\y^2-4x=0\end{cases}$

Fig. 1

The only valid solution is $x=0, y=0, k=-\frac{1}{2}$ . At $(x, y) = (0, 0), f=(0-1)^2+0^2=1$ . It is the global minimum:

$\forall (x, y) \ne (0, 0), y^2=4x \implies x>0$ .

$(x-1)^2+y^2 \overset{y^2=4x}{=}(x-1)^2+4x=x^2-2x+1+4x=x^2+2x+1\overset{x>0}{>}1=f(0,0)$

Example-3 Find the shortest distance from the point $(a, b)$ to the line $Ax+By+C=0$ .

We want find a point $(x_0, y_0)$ on the line $Ax+By+C=0$ so that the distance between $(a, b)$ and $(x_0, y_0)$ is minimal.

To this end, we minimize $(x_0-a)^2+(y_0-b)^2$ where $Ax_0+By_0+C=0$ (see Fig. 2)

Fig. 2

We found that

$x_0=\frac{aB^2-bAB-AC}{A^2+B^2}, y_0=\frac{bA^2-aAB-BC}{A^2+B^2}$

and the distance between $(a, b)$ and $(x_0, y_0)$ is

$\frac{|Aa+Bb+C|}{\sqrt{A^2+B^2}}\quad\quad\quad(11)$

To show that (11) is the minimal distance, $\forall (x, y) \ne (x_0, y_0), Ax+By+C=0$ .

Let $d_1 = x-x_0, d_2=y-y_0$ , we have

$x = x_0 + d_1, y=y_0 + d_2, d_1 \ne 0, d_2 \ne 0$ .

Since $Ax+By+C=0$ ,

$A(x_0+d_1)+B(y_0+d_2)+C=Ax_0+Ad_1+By_0+Bd_2+C=0$

That is

$Ax_0+By_0+C+Ad_1+Bd_2=0$ .

By the fact that $Ax_0+By_0+C=0$ , we have

$Ad_1 + Bd_2 =0\quad\quad\quad(12)$

Compute $(x-a)^2+(y-b)^2 - ((x_0-a)^2+(y_0-b)^2)$ (see Fig. 3)

Fig. 3

yields

$\boxed{-\frac{2d_2BC}{B^2+A^2}-\frac{2d_1AC}{B^2+A^2}}+[\frac{d_2^2B^2}{B^2+A^2}]-\underline{\frac{2bd_2B^2}{B^2+A^2}}+(\frac{d_1^2B^2}{B^2+A^2})-\frac{2ad_2AB}{B^2+A^2}-\underline{\frac{2bd_1AB}{B^2+A^2}}+[\frac{d_2A^2}{B^2+A^2}]+(\frac{d_1^2A^2}{B^2+A^2})-\frac{2ad_1A^2}{B^2+A^2}$