Implementing Custom Gradients
- Define a Custom Operation: is a simple operation that squares the input x.
- Define the Gradient Function: computes the gradient of custom_op with respect to its input x. In this case, since custom_op(x) = x^2, the gradient is 2 * x.
- Use tf.custom_gradient to Define Custom Operation with Gradient : tf.custom_gradient is a decorator that allows you to define a custom operation along with its gradient function. Inside custom_op_with_grad, we compute y using custom_op(x) and define the gradient function grad(dy), which computes the gradient of the output with respect to x.
- Example Usage and Gradient Computation: compute the gradient of custom_op both using TensorFlow’s automatic differentiation (grad_auto) and the custom gradient function (grad_custom) we defined earlier.
- Print the Results.
Example compares the performance of a simple neural network for classifying handwritten digits (MNIST dataset) using custom and default gradients.
1. Libraries and Dataset
Python3
import tensorflow as tf from tensorflow.keras.datasets import mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.astype( 'float32' ) / 255.0 x_test = x_test.astype( 'float32' ) / 255.0 |
2. Custom Gradient Function:
Custom gradient for the rectified linear unit (ReLU) activation function. ReLU is already supported in TensorFlow, but here’s a simplified custom version.
custom_relu(x)
: This function computes the ReLU activation function, which returnsx
ifx
is greater than or equal to zero, and zero otherwise. It uses TensorFlow’stf.maximum
function to achieve this.
custom_relu_grad(x)
: This function computes the gradient of the ReLU function. It returns a tensor with the same shape asx
, where each element is 1.0 if the corresponding element inx
is greater than zero, and 0.0 otherwise. It uses TensorFlow’stf.where
function for this purpose.@tf.custom_gradient
: This is a decorator that allows you to define a custom gradient for a TensorFlow operation. It wraps thecustom_relu_op
function, which will be the custom operation with a defined gradient.custom_relu_op(x)
: This function computes the ReLU activation function usingcustom_relu(x)
. It also defines a gradient functiongrad(dy)
that computes the gradient of the output with respect to the input (dy
is the gradient of the output ofcustom_relu_op
with respect to some external value).grad(dy)
: This function computes the gradient of the output ofcustom_relu_op
with respect to its input (x
). It usescustom_relu_grad(x)
to compute the gradient and multiplies it bydy
to propagate the gradient backward through the operation.
Python3
def custom_relu(x): return tf.maximum(x, 0.0 ) def custom_relu_grad(x): return tf.where(x > 0 , tf.ones_like(x), tf.zeros_like(x)) @tf .custom_gradient def custom_relu_op(x): y = custom_relu(x) def grad(dy): return custom_relu_grad(x) * dy return y, grad |
3. Model Definition:
- Model A uses the default ReLU activation function provided by TensorFlow.
- Model B uses the custom ReLU activation function (
custom_relu_op
) that we defined earlier.
Python3
# Model A (Default ReLU) model_a = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape = ( 28 , 28 )), tf.keras.layers.Dense( 128 , activation = 'relu' ), tf.keras.layers.Dense( 10 , activation = 'softmax' ) ]) # Model B (Custom ReLU) model_b = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape = ( 28 , 28 )), tf.keras.layers.Dense( 128 , activation = custom_relu_op), tf.keras.layers.Dense( 10 , activation = 'softmax' ) ]) |
4. Training:
Python3
model_a. compile (optimizer = 'adam' , loss = 'sparse_categorical_crossentropy' , metrics = [ 'accuracy' ]) model_b. compile (optimizer = 'adam' , loss = 'sparse_categorical_crossentropy' , metrics = [ 'accuracy' ]) model_a.fit(x_train, y_train, epochs = 5 ) model_b.fit(x_train, y_train, epochs = 5 ) test_loss_a, test_acc_a = model_a.evaluate(x_test, y_test) test_loss_b, test_acc_b = model_b.evaluate(x_test, y_test) |
Output:
Epoch 1/5
1875/1875 [==============================] - 10s 4ms/step - loss: 0.2645 - accuracy: 0.9246
Epoch 2/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.1155 - accuracy: 0.9656
Epoch 3/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.0797 - accuracy: 0.9751
Epoch 4/5
1875/1875 [==============================] - 8s 5ms/step - loss: 0.0596 - accuracy: 0.9817
Epoch 5/5
1875/1875 [==============================] - 6s 3ms/step - loss: 0.0461 - accuracy: 0.9859
Epoch 1/5
1875/1875 [==============================] - 7s 2ms/step - loss: 0.2581 - accuracy: 0.9256
5. Evaluation:
Python3
print ( "Model A (Default ReLU): Test Accuracy:" , test_acc_a) print ( "Model B (Custom ReLU): Test Accuracy:" , test_acc_b) |
Output:
Model A (Default ReLU): Test Accuracy: 0.9751999974250793
Model B (Custom ReLU): Test Accuracy: 0.9776999950408936
Both models appear to perform rather well on the test dataset; in terms of test accuracy, Model B (Custom ReLU) marginally outperforms Model A (Default ReLU). The behavior of the custom ReLU function and the unique features of the dataset may be the cause of this discrepancy.
It’s important to note that there may not be much of a practical difference in accuracy between the two models due to their modest differences. It does show, though, that utilizing a custom activation function, such as custom_relu_op, might occasionally result in better model performance.
Custom gradients in TensorFlow
Custom gradients in TensorFlow allow you to define your gradient functions for operations, providing flexibility in how gradients are computed for complex or non-standard operations. This can be useful for tasks such as implementing custom loss functions, incorporating domain-specific knowledge into the gradient computation, or handling operations that TensorFlow does not natively support.
Contact Us