First-Order and Higher-Order Gradients in TensorFlow
In machine learning, deeper understanding of the gradients of a function will help you to make sure that your model has an optimal performance. tf.GradientTape is a versatile tool of TensorFlow that will help us in both first order and higher order gradients. Therefore, let’s discuss First-Order and Higher-Order Gradients.
First-Order Gradients
Consider that you are working on building a neural network model and it is similar to climbing up a hill. As you know, the steeper the slope, the faster we would want to ascend. Therefore, in similar way First-order gradients, often referred to simply as gradients (∇), guide us in this ascent. Mathematically, for a scalar function f(x), the first-order gradient is given by:
You need not worry about how to solve the equation. tf.GradientTape allows us to do that easily by recording operations as they happen during the forward pass. This basic capability also ensures that our models learn and adapt with each iteration.
Python
import tensorflow as tf x = tf.Variable( 5.0 ) y = tf.Variable( 2.0 ) def f(x, y): return 2 * x * * 3 + 5 * y * * 2 + 11 * x + 5 # Calculate derivative w.r.t. x with tf.GradientTape() as tape: z = f(x, y) dx = tape.gradient(z, x) # Calculate derivative w.r.t. y (create a new tape) with tf.GradientTape() as tape: z = f(x, y) dy = tape.gradient(z, y) print ( "Partial derivative of f with respect to x:" , dx.numpy()) print ( "Partial derivative of f with respect to y:" , dy.numpy()) |
Output:
Partial derivative of f with respect to x: 161.0
Partial derivative of f with respect to y: 20.0
Higher-Order Gradients
Since you have enough understanding of first order gradients, now we will take a look at higher-order gradients. We know that first-order derivatives tell us about the slope of our hill. Whereas higher order derivatives provide us insights into the curvature. It is truly remarkable that TensorFlow allows not only the first but also the second, third, and nth derivatives seamlessly. Mathematically, the second-order gradient (Hessian matrix) for f(x) is given by:
Therefore, it suggests that we can not only measure how steep our hill is, but also how its steepness is changing. Thus, it allows us to intricate computations and advanced optimization techniques. TensorFlow’s tf.GradientTape can help us in refining the model architecture as well as exploring sophisticated algorithms.
For automatic distinction, the actions within the context manager tf.GradientTape are recorded. The gradient computation is also recorded, in case if the gradients are computed in that environment. Therefore, the same API holds true for gradients of higher order as well.
Consider the following example:
Python3
import tensorflow as tf # Define symbolic variables for x and y x = tf.Variable( 5.0 ) y = tf.Variable( 2.0 ) # Define the function def f(x, y): return 2 * x * * 3 + 5 * y * * 2 + 11 * x * y + 5 # Create a persistent GradientTape for all calculations with tf.GradientTape(persistent = True ) as tape: z = f(x, y) # Calculate all derivatives within the persistent tape dx = tape.gradient(z, x) # first-order partial derivative w.r.t. x dy = tape.gradient(z, y) # first-order partial derivative w.r.t. y dxx = tape.gradient(dx, x) # second-order partial derivative w.r.t. x dyy = tape.gradient(dy, y) # second-order partial derivative w.r.t. y dxy = tape.gradient(dx, y) # mixed derivative dyx = tape.gradient(dy, x) # mixed derivative # Evaluate them at specific values print ( "Partial derivative of f w.r.t. x:" , dx.numpy()) #6*x**2 +11y print ( "Partial derivative of f w.r.t. y:" , dy.numpy()) #10*y+11x print ( "Second-order derivative of f w.r.t. x (d^2f/dx^2):" , dxx.numpy()) # 12*x print ( "Second-order derivative of f w.r.t. y (d^2f/dy^2):" , dyy.numpy()) #10 print ( "Mixed derivative (d^2f/dxdy):" , dxy.numpy()) #11 print ( "Mixed derivative (d^2f/dydx):" , dyx.numpy()) #11 # Delete the tape explicitly to avoid memory leaks del tape |
Output:
Partial derivative of f w.r.t. x: 172.0
Partial derivative of f w.r.t. y: 75.0
Second-order derivative of f w.r.t. x (d^2f/dx^2): 60.0
Second-order derivative of f w.r.t. y (d^2f/dy^2): 10.0
Mixed derivative (d^2f/dxdy): 11.0
Mixed derivative (d^2f/dydx): 11.0
Even if the tf.GradientTape.gradient only computes the gradient of a scalar, this pattern does not generalize to build a Hessian matrix. It holds true even if you do get the second derivative of a scalar function from it.
When you want to calculate a scalar from a gradient and use that scalar as a source for another gradient computation, such in the example below, “Nested calls to tf.GradientTape.gradient” is a useful pattern.
Higher-Order gradients in TensorFlow
Higher order gradients are one of the important topics in the domains of machine learning. TensorFlow has a function named tf.GradientTape that will help us to be familiar with higher order gradients. In this article, we will be understanding first and higher order derivatives. Then, we will discuss an example of Input gradient regularization.
Contact Us