Illustration 2

Build a handwritten digit classifications model using a custom optimizer

Step 1: 

Import the necessary libraries

Python3




import torch
import torch.nn as nn
from torch.optim import Optimizer
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from torch.utils.tensorboard import SummaryWriter
import math
import matplotlib.pyplot as plt
  
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


Step 2: 

Now, we’ll load the MNIST dataset, and create a data loader for it.

Python3




# Loading the dataset
dataset = MNIST(root='.', train=True, download=True, transform=ToTensor())
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
dataloader.dataset


Output:

Dataset MNIST
    Number of datapoints: 60000
    Root location: .
    Split: Train
    StandardTransform
Transform: ToTensor()

Step 3:

Let’s visualize the first batch of our dataset.

Python3




sample_idx = torch.randint(len(dataloader), size=(1,)).item()
len(dataloader)
for i, batch in enumerate(dataloader):
    figure = plt.figure(figsize=(16, 16))
    img, label = batch
    for j in range(img.shape[0]):
        figure.add_subplot(8, 8, j+1)
        plt.imshow(img[j].squeeze(), cmap="gray")
        plt.title(label[j])
        plt.axis("off")
          
    plt.show()
    break


Output:

First batch input images

Step 4: 

Next, we’ll define our model architecture, a simple fully connected network with two hidden layers

Python3




class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(28*28, 512)
        self.fc2 = nn.Linear(512, 512)
        self.fc3 = nn.Linear(512, 10)
  
    def forward(self, x):
        x = x.view(-1, 28*28)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x
        
# Model
model = Net().to(device)


Step 4:

 we’ll define our loss function, in this case, we’ll use the cross-entropy loss.

Python3




# Loss functions
loss_fn = nn.CrossEntropyLoss()


Step 5: 

Next, we’ll define our custom optimizer

Python3




# Define custom optimizer
class MyAdam(torch.optim.Adam):
    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), weight_decay=0):
        super().__init__(params, lr=lr, betas=betas)
        self.weight_decay = weight_decay
  
    def step(self):
        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data
                if grad.is_sparse:
                    raise RuntimeError("Adam does not support sparse gradients")
  
                state = self.state[p]
  
                # State initialization
                if len(state) == 0:
                    state["step"] = 0
                    # Exponential moving average of gradient values
                    state["exp_avg"] = torch.zeros_like(p.data)
                    # Exponential moving average of squared gradient values
                    state["exp_avg_sq"] = torch.zeros_like(p.data)
  
                exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
                beta1, beta2 = group["betas"]
  
                state["step"] += 1
  
                if self.weight_decay != 0:
                    grad = grad.add(p.data, alpha=self.weight_decay)
  
                # Decay the first and second moment running average coefficient
                exp_avg.mul_(beta1).add_(1 - beta1, grad)
                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
  
                denom = exp_avg_sq.sqrt().add_(group["eps"])
  
                bias_correction1 = 1 - beta1 ** state["step"]
                bias_correction2 = 1 - beta2 ** state["step"]
                step_size = group["lr"] * math.sqrt(bias_correction2) / bias_correction1
  
                p.data.addcdiv_(-step_size, exp_avg, denom)
  
# Optimizer
optimizer = MyAdam(model.parameters(), weight_decay=0.00001)


Step 6:

Now, Train the model with custom optimizer and Plot the training loss.

Python3




# Training loop
num_epochs = 10
for i in range(num_epochs):
    for inputs, labels in dataloader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = model(inputs)
        loss = loss_fn(outputs, labels)
  
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        #scheduler.step()
          
    plt.plot(i,loss.item(),'ro-')
    print(i,'>> Loss :', loss.item())
  
plt.title('Losses over iterations')
plt.xlabel('iterations')
plt.ylabel('Losses')
plt.show()


Output:

0 >> Loss : nan
1 >> Loss : 1.2611686178923354e-44
2 >> Loss : nan
3 >> Loss : 8.407790785948902e-45
4 >> Loss : nan
5 >> Loss : 1.401298464324817e-45
6 >> Loss : nan
7 >> Loss : 0.0
8 >> Loss : nan
9 >> Loss : 1.401298464324817e-45

Losses

Note:  Losses will be different for different devices. 

Custom Optimizers in Pytorch

In PyTorch, an optimizer is a specific implementation of the optimization algorithm that is used to update the parameters of a neural network. The optimizer updates the parameters in such a way that the loss of the neural network is minimized. PyTorch provides various built-in optimizers such as SGD, Adam, Adagrad, etc. that can be used out of the box. However, in some cases, the built-in optimizers may not be suitable for a particular problem or may not perform well. In such cases, one can create their own custom optimizer.

A custom optimizer in PyTorch is a class that inherits from the torch.optim.Optimizer base class. The custom optimizer should implement the init and step methods. The init method is used to initialize the optimizer’s internal state, and the step method is used to update the parameters of the model.

Similar Reads

Creating a Custom Optimizer:

In PyTorch, creating a custom optimizer is a two-step process. First, we need to create a class that inherits from the torch.optim.Optimizer class, and override the following methods:...

Illustration 1:

...

Customizing Optimizers:

Let’s create a simple training loop that shows how to use the custom optimizer to train a model. The loop would perform the following steps:...

Illustration 2:

...

Conclusion:

...

Contact Us