Illustration 2
Build a handwritten digit classifications model using a custom optimizer
Step 1:
Import the necessary libraries
Python3
import torch import torch.nn as nn from torch.optim import Optimizer from torch.utils.data import DataLoader from torchvision.datasets import MNIST from torchvision.transforms import ToTensor from torch.utils.tensorboard import SummaryWriter import math import matplotlib.pyplot as plt device = torch.device( "cuda" if torch.cuda.is_available() else "cpu" ) |
Step 2:
Now, we’ll load the MNIST dataset, and create a data loader for it.
Python3
# Loading the dataset dataset = MNIST(root = '.' , train = True , download = True , transform = ToTensor()) dataloader = DataLoader(dataset, batch_size = 32 , shuffle = True ) dataloader.dataset |
Output:
Dataset MNIST Number of datapoints: 60000 Root location: . Split: Train StandardTransform Transform: ToTensor()
Step 3:
Let’s visualize the first batch of our dataset.
Python3
sample_idx = torch.randint( len (dataloader), size = ( 1 ,)).item() len (dataloader) for i, batch in enumerate (dataloader): figure = plt.figure(figsize = ( 16 , 16 )) img, label = batch for j in range (img.shape[ 0 ]): figure.add_subplot( 8 , 8 , j + 1 ) plt.imshow(img[j].squeeze(), cmap = "gray" ) plt.title(label[j]) plt.axis( "off" ) plt.show() break |
Output:
Step 4:
Next, we’ll define our model architecture, a simple fully connected network with two hidden layers
Python3
class Net(nn.Module): def __init__( self ): super (Net, self ).__init__() self .fc1 = nn.Linear( 28 * 28 , 512 ) self .fc2 = nn.Linear( 512 , 512 ) self .fc3 = nn.Linear( 512 , 10 ) def forward( self , x): x = x.view( - 1 , 28 * 28 ) x = torch.relu( self .fc1(x)) x = torch.relu( self .fc2(x)) x = self .fc3(x) return x # Model model = Net().to(device) |
Step 4:
we’ll define our loss function, in this case, we’ll use the cross-entropy loss.
Python3
# Loss functions loss_fn = nn.CrossEntropyLoss() |
Step 5:
Next, we’ll define our custom optimizer
Python3
# Define custom optimizer class MyAdam(torch.optim.Adam): def __init__( self , params, lr = 1e - 3 , betas = ( 0.9 , 0.999 ), weight_decay = 0 ): super ().__init__(params, lr = lr, betas = betas) self .weight_decay = weight_decay def step( self ): for group in self .param_groups: for p in group[ 'params' ]: if p.grad is None : continue grad = p.grad.data if grad.is_sparse: raise RuntimeError( "Adam does not support sparse gradients" ) state = self .state[p] # State initialization if len (state) = = 0 : state[ "step" ] = 0 # Exponential moving average of gradient values state[ "exp_avg" ] = torch.zeros_like(p.data) # Exponential moving average of squared gradient values state[ "exp_avg_sq" ] = torch.zeros_like(p.data) exp_avg, exp_avg_sq = state[ "exp_avg" ], state[ "exp_avg_sq" ] beta1, beta2 = group[ "betas" ] state[ "step" ] + = 1 if self .weight_decay ! = 0 : grad = grad.add(p.data, alpha = self .weight_decay) # Decay the first and second moment running average coefficient exp_avg.mul_(beta1).add_( 1 - beta1, grad) exp_avg_sq.mul_(beta2).addcmul_( 1 - beta2, grad, grad) denom = exp_avg_sq.sqrt().add_(group[ "eps" ]) bias_correction1 = 1 - beta1 * * state[ "step" ] bias_correction2 = 1 - beta2 * * state[ "step" ] step_size = group[ "lr" ] * math.sqrt(bias_correction2) / bias_correction1 p.data.addcdiv_( - step_size, exp_avg, denom) # Optimizer optimizer = MyAdam(model.parameters(), weight_decay = 0.00001 ) |
Step 6:
Now, Train the model with custom optimizer and Plot the training loss.
Python3
# Training loop num_epochs = 10 for i in range (num_epochs): for inputs, labels in dataloader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) loss = loss_fn(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() #scheduler.step() plt.plot(i,loss.item(), 'ro-' ) print (i, '>> Loss :' , loss.item()) plt.title( 'Losses over iterations' ) plt.xlabel( 'iterations' ) plt.ylabel( 'Losses' ) plt.show() |
Output:
0 >> Loss : nan 1 >> Loss : 1.2611686178923354e-44 2 >> Loss : nan 3 >> Loss : 8.407790785948902e-45 4 >> Loss : nan 5 >> Loss : 1.401298464324817e-45 6 >> Loss : nan 7 >> Loss : 0.0 8 >> Loss : nan 9 >> Loss : 1.401298464324817e-45
Note: Losses will be different for different devices.
Custom Optimizers in Pytorch
In PyTorch, an optimizer is a specific implementation of the optimization algorithm that is used to update the parameters of a neural network. The optimizer updates the parameters in such a way that the loss of the neural network is minimized. PyTorch provides various built-in optimizers such as SGD, Adam, Adagrad, etc. that can be used out of the box. However, in some cases, the built-in optimizers may not be suitable for a particular problem or may not perform well. In such cases, one can create their own custom optimizer.
A custom optimizer in PyTorch is a class that inherits from the torch.optim.Optimizer base class. The custom optimizer should implement the init and step methods. The init method is used to initialize the optimizer’s internal state, and the step method is used to update the parameters of the model.
Contact Us