First-Order and Higher-Order Gradients in TensorFlow

Input gradient regularization

In machine learning, deeper understanding of the gradients of a function will help you to make sure that your model has an optimal performance. tf.GradientTape is a versatile tool of TensorFlow that will help us in both first order and higher order gradients. Therefore, let’s discuss First-Order and Higher-Order Gradients.

First-Order Gradients

Consider that you are working on building a neural network model and it is similar to climbing up a hill. As you know, the steeper the slope, the faster we would want to ascend. Therefore, in similar way First-order gradients, often referred to simply as gradients (∇), guide us in this ascent. Mathematically, for a scalar function f(x), the first-order gradient is given by:

You need not worry about how to solve the equation. tf.GradientTape allows us to do that easily by recording operations as they happen during the forward pass. This basic capability also ensures that our models learn and adapt with each iteration.

Python

import tensorflow as tf
 
x = tf.Variable(5.0)
y = tf.Variable(2.0)
 
def f(x, y):
  return 2* x**3 + 5 * y**2 + 11 * x + 5
 
# Calculate derivative w.r.t. x
with tf.GradientTape() as tape:
  z = f(x, y)
dx = tape.gradient(z, x)
 
# Calculate derivative w.r.t. y (create a new tape)
with tf.GradientTape() as tape:
  z = f(x, y)
dy = tape.gradient(z, y)
 
print("Partial derivative of f with respect to x:", dx.numpy())
print("Partial derivative of f with respect to y:", dy.numpy())

Output:

Partial derivative of f with respect to x: 161.0
Partial derivative of f with respect to y: 20.0

Higher-Order Gradients

Since you have enough understanding of first order gradients, now we will take a look at higher-order gradients. We know that first-order derivatives tell us about the slope of our hill. Whereas higher order derivatives provide us insights into the curvature. It is truly remarkable that TensorFlow allows not only the first but also the second, third, and nth derivatives seamlessly. Mathematically, the second-order gradient (Hessian matrix) for f(x) is given by:

Therefore, it suggests that we can not only measure how steep our hill is, but also how its steepness is changing. Thus, it allows us to intricate computations and advanced optimization techniques. TensorFlow’s tf.GradientTape can help us in refining the model architecture as well as exploring sophisticated algorithms.

For automatic distinction, the actions within the context manager tf.GradientTape are recorded. The gradient computation is also recorded, in case if the gradients are computed in that environment. Therefore, the same API holds true for gradients of higher order as well.

Consider the following example:

Python3

import tensorflow as tf
 
# Define symbolic variables for x and y
x = tf.Variable(5.0)
y = tf.Variable(2.0)
 
# Define the function
def f(x, y):
  return 2* x**3 + 5 * y**2 + 11*x*y + 5
 
# Create a persistent GradientTape for all calculations
with tf.GradientTape(persistent=True) as tape:
  z = f(x, y)
 
  # Calculate all derivatives within the persistent tape
  dx = tape.gradient(z, x)  # first-order partial derivative w.r.t. x
  dy = tape.gradient(z, y)  # first-order partial derivative w.r.t. y
  dxx = tape.gradient(dx, x)  # second-order partial derivative w.r.t. x
  dyy = tape.gradient(dy, y)  # second-order partial derivative w.r.t. y
  dxy = tape.gradient(dx, y)  # mixed derivative
  dyx = tape.gradient(dy, x)  # mixed derivative
 
# Evaluate them at specific values
print("Partial derivative of f w.r.t. x:", dx.numpy())  #6*x**2 +11y
print("Partial derivative of f w.r.t. y:", dy.numpy())  #10*y+11x
print("Second-order derivative of f w.r.t. x (d^2f/dx^2):", dxx.numpy()) # 12*x
print("Second-order derivative of f w.r.t. y (d^2f/dy^2):", dyy.numpy())  #10
print("Mixed derivative (d^2f/dxdy):", dxy.numpy())   #11
print("Mixed derivative (d^2f/dydx):", dyx.numpy())   #11
 
 
# Delete the tape explicitly to avoid memory leaks
del tape

Output:

Partial derivative of f w.r.t. x: 172.0
Partial derivative of f w.r.t. y: 75.0
Second-order derivative of f w.r.t. x (d^2f/dx^2): 60.0
Second-order derivative of f w.r.t. y (d^2f/dy^2): 10.0
Mixed derivative (d^2f/dxdy): 11.0
Mixed derivative (d^2f/dydx): 11.0

Even if the tf.GradientTape.gradient only computes the gradient of a scalar, this pattern does not generalize to build a Hessian matrix. It holds true even if you do get the second derivative of a scalar function from it.

When you want to calculate a scalar from a gradient and use that scalar as a source for another gradient computation, such in the example below, “Nested calls to tf.GradientTape.gradient” is a useful pattern.

Higher-Order gradients in TensorFlow

Higher order gradients are one of the important topics in the domains of machine learning. TensorFlow has a function named tf.GradientTape that will help us to be familiar with higher order gradients. In this article, we will be understanding first and higher order derivatives. Then, we will discuss an example of Input gradient regularization.

Tags:

#Dev Scripter 2024 #Tensorflow #AI-ML-DS #Deep Learning #Dev Scripter