June 14, 2006

The Chain Rule… inside out!

Posted in Mathematics at 11:39 am by pmatos

The Chain Rule is definitely an extremely useful tool, whether you’re working in single-variable calculus or multi-variable calculus. Since I had some fun with it today I’ll give a brief explanation here today… hopefully I’ll help someone in the future!

Chain Rule for single-variable functions ([tex]f : \mathbb{R} \rightarrow \mathbb{R}[/tex])

This is the simplest case and in fact to understand the rule for the general case you should start here.

Basically consider [tex]g = f(r(t))[/tex], so we simply have [tex]g'(t) = f'(r(t)) \cdot r'(t)[/tex].
As an example, consider:
[tex]f(x) = 1/x[/tex] and,
[tex]r(t) = t^2, t \neq 0 [/tex].
So if we wish to compute the derivative of [tex]g(t) = f(r(t))[/tex], we just need to apply the chain rule above:
[tex]g'(t) = f'(r(t)) \cdot r'(t) = [/tex]
[tex] = f'(t^2) \cdot 2t = [/tex]
[tex] = -1/(t^2) \cdot 2t = \frac{-2t}{t^2} = \frac{-2}{t} [/tex]

I left the obvious part unmentioned (the why I had to restrict [tex]r[/tex]). Since, [tex]D_f = ]-\infty, 0[ \cup ]0, +\infty[ [/tex] and [tex]D^’_g = \mathbb{R}[/tex], this is only valid if we restrict [tex]r[/tex].

Now, onto a more general case.

Chain Rule for multi-variable functions ([tex]f : \mathbb{R}^n \rightarrow \mathbb{R}[/tex])

So, given a scalar field defined on an open set [tex]S[/tex] in [tex]\mathbb{R}^n[/tex], and let [tex]r[/tex] be a vector-valued function which maps an interval J from [tex]\mathbb{R}^1[/tex] into [tex]S[/tex]. Define the composite function [tex]g = f \circ r[/tex] on [tex]J[/tex] by the equation:
[tex]g(t) = f[r(t)][/tex] if [tex]t \in J[/tex]

Let [tex]t[/tex] be a point in [tex]J[/tex] at which [tex]r'(t)[/tex] exists and assume that [tex]f[/tex] is differenciable at [tex]r(t)[/tex]. Then [tex]g'(t)[/tex] exists and is equal to the dot product:
[tex]g'(t) = \nabla f(a) \cdot r'(t)[/tex], where [tex]a = r(t)[/tex]

Note that [tex]\nabla f(a)[/tex] known as the gradient of [tex]f[/tex] at [tex]a[/tex] is the vector whose components are the partial derivatives of [tex]f[/tex] at [tex]a[/tex]:
[tex]\nabla f(a) = (D_1 f(a), \ldots, D_n f(a))[/tex]

So, [tex]\nabla f(a)[/tex] is a vector field defined at each point [tex]a[/tex] where the partial derivatives [tex]D_1 f(a), \ldots, D_n f(a)[/tex] exist.

Consider that [tex]F = f \circ r[/tex] and as we have been considering in this section [tex]f : \mathbb{R}^2 \rightarrow \mathbb{R}[/tex] and [tex]r : \mathbb{R} \rightarrow \mathbb{R}^2[/tex], so we have that [tex]F : \mathbb{R} \rightarrow \mathbb{R}[/tex]. They are given by [tex]f(x, y) = x^2 + y^2, r(t) = (t, t^2)[/tex] and let’s compute [tex]F'(t), F”(t)[/tex].

By using the chain rule we have: [tex]F'(t) = \nabla f(r(t)) \cdot r'(t)[/tex]. So,
[tex]\nabla f = \left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}\right) = (2x, 2y)[/tex] and
[tex]r'(t) = (1, t^2)[/tex]

Substituting in the chain rule:
[tex]F'(t) = \nabla f(r(t)) \cdot r'(t) = [/tex]
[tex] = (2t, 2t^2) \cdot (1, 2t) = [/tex]
[tex] = 2t + 4t^3 [/tex]

And trivially, we need no chain rule for the second derivative: [tex]F”(t) = [F'(t)]’ = 12t^2+2 [/tex].

In general, the chain rule for multi-variable vector fields ([tex]f : \mathbb{R}^n \rightarrow \mathbb{R}^m[/tex])

Now, let’s look at the general case when we have vector fields.
Chain Rule: Let [tex]f[/tex], and [tex]g[/tex] be vector fields such that the composition [tex]h = f \circ g[/tex] is defined in a neighborhood of a point [tex]a[/tex]. Assume that [tex]g[/tex] is differentiable at [tex]a[/tex], with total derivative [tex]g'(a)[/tex]. Let [tex]b = g(a)[/tex] and assume that [tex]f[/tex] is differentiable at [tex]b[/tex], with total derivative [tex]f'(b)[/tex]. Then [tex]h[/tex] is differentiable at [tex]a[/tex], and the total derivative [tex]h'(a)[/tex] is given by:
[tex]h'(a) = f'(b) \circ g'(a)[/tex],
the composition of the linear transformations [tex]f'(b)[/tex] and [tex]g'(a)[/tex].

Note that the derivatives of vector fields are given by their respective jacobian matrices and that the composition of the linear transformation is obtained by the multiplication of the jacobian matrices (which represent the linear transformations).

Ok, so let’s try to do a complete exercise (since officially, or from a theoretical point-of-view, we already know how to do it).
Given:
[tex]f: \mathbb{R}^2 \rightarrow \mathbb{R}^2[/tex]
[tex]f(x, y) = (e^{x+2y}, \sin(y+2x))[/tex]
[tex]g: \mathbb{R}^3 \rightarrow \mathbb{R}^2[/tex]
[tex]g(u,v,w) = (u + 2v^2 + 3w^3, 2v-u^2)[/tex]

Let’s compute the jacobian matrices of [tex]f[/tex] and [tex]g[/tex]:
[tex]D_f = \begin{pmatrix} \frac{\partial f_1}{\partial x} & \frac{\partial f_1}{\partial y} \\ \frac{\partial f_2}{\partial x} & \frac{\partial f_2}{\partial y} \end{pmatrix} = [/tex]
[tex] = \begin{pmatrix} e^{x+2y} & 2e^{x+2y} \\ 2\cos(y+2x) & \cos(y+2x) \end{pmatrix}[/tex]

[tex]D_g = \begin{pmatrix}\frac{\partial g_1}{\partial u} & \frac{\partial g_1}{\partial v} & \frac{\partial g_1}{\partial w}\\ \frac{\partial g_2}{\partial u} & \frac{\partial g_2}{\partial v} & \frac{\partial g_2}{\partial w}\end{pmatrix} = [/tex]
[tex] = \begin{pmatrix}1 & 4v & 9w^2 \\ -2u & 2 & 0 \end{pmatrix}[/tex]

Now, define [tex]h(u,v,w) = f(g(u,v,w))[/tex] and compute [tex]D_h(1,-1,1)[/tex]. Well, well… here’s the application of our chain rule. So, by the chain rule:
[tex]h(1,-1,1) = f'(g(1,-1,1)) \circ g'(1,-1,1)[/tex]

By substitution in the expression:
[tex]g(1,-1,1) = (1+2+3, -2-1) = (6,-3)[/tex]

By substituting in the jacobian matrix computed previously for [tex]f[/tex] the value of [tex]g(1,-1,1)[/tex]:
[tex]f'(g(1,-1,1)) = f'(6,3) = \begin{pmatrix} 1 & 2 \\ 2\cos(9) & \cos(9) \end{pmatrix}[/tex]

By substituting in the jacobian matrix computed previously for [tex]g[/tex]:
[tex]g'(1,-1,1) = \begin{pmatrix} 1 & -4 & 9 \\ -2 & 2 & 0 \end{pmatrix}[/tex]

Now that we have computed all the values needed for the chain rule application, applying it is just composing the linear transformations given by the derivatives of [tex]f[/tex] and [tex]g[/tex], which is another way of saying that we need to multiply their jacobian matrices:
[tex]h'(1,-1,1) = \begin{pmatrix} 1 & 2 \\ 2\cos(9) & \cos(9) \end{pmatrix} \cdot \begin{pmatrix} 1 & -4 & 9 \\ -2 & 2 & 0 \end{pmatrix} = [/tex]
[tex] = \begin{pmatrix}-3 & 0 & 9 \\ 0 & -6\cos(9) & 18\cos(9) \end{pmatrix}[/tex]

And we’re done… :-)
Computing Jacobian’s and working with these kinds of tools are extremelly helpful in Geometry, Differential Calculus and some branches of Physics.

References:

  • Weisstein, Eric W. “Chain Rule.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/ChainRule.html
  • Apostol, Tom “Calculus”, Volume II, Second Edition
  • Anton, Bivens and Davis, Calculus, Seventh Edition

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: