Customizing Optimizers

There are many ways to customize optimizers in PyTorch, Some of them are as follows:

Changing the learning rate schedule:

 The learning rate of the optimizer can be changed during training using a learning rate scheduler. PyTorch provides several built-in schedulers such as torch.optim.lr_scheduler.StepLR and torch.optim.lr_scheduler.ExponentialLR. We can also create our own scheduler by inheriting from the torch.optim.lr_scheduler._LRScheduler class.

In below code, we are using the torch.optim.lr_scheduler.StepLR scheduler which will multiply the learning rate by a factor of gamma every step_size iterations.

Python3




# Initialize an optimizer with a fixed learning rate
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
  
# Create a learning rate scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
  
num_epochs = 200
# In the training loop
for i in range(num_epochs):
    # Perform the training step
    optimizer.zero_grad()
      
    y_pred = model(X)
    loss = criterion(y_pred, y)
      
    loss.backward()
    optimizer.step()
    # Update the learning rate
    scheduler.step()


Adding regularization

To add regularization to the optimizer, we can modify the step() method to include the regularization term in the update of the model parameters. For example, we can add L1 or L2 regularization by modifying the step() method to include a term that penalizes the absolute or squared values of the parameters respectively.

Python3




# Define custom optimizer
class MyAdam(torch.optim.Adam):
    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), weight_decay=0):
        super().__init__(params, lr=lr, betas=betas)
        self.weight_decay = weight_decay
  
    def step(self):
        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                grad = p.grad.data
                if grad.is_sparse:
                    raise RuntimeError("Adam does not support sparse gradients")
  
                state = self.state[p]
  
                # State initialization
                if len(state) == 0:
                    state["step"] = 0
                    # Exponential moving average of gradient values
                    state["exp_avg"] = torch.zeros_like(p.data)
                    # Exponential moving average of squared gradient values
                    state["exp_avg_sq"] = torch.zeros_like(p.data)
  
                exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
                beta1, beta2 = group["betas"]
  
                state["step"] += 1
  
                if self.weight_decay != 0:
                    grad = grad.add(p.data, alpha=self.weight_decay)
  
                # Decay the first and second moment running average coefficient
                exp_avg.mul_(beta1).add_(1 - beta1, grad)
                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)
  
                denom = exp_avg_sq.sqrt().add_(group["eps"])
  
                bias_correction1 = 1 - beta1 ** state["step"]
                bias_correction2 = 1 - beta2 ** state["step"]
                step_size = group["lr"] * math.sqrt(bias_correction2) / bias_correction1
  
                p.data.addcdiv_(-step_size, exp_avg, denom)
  
# Optimizer
optimizer = MyAdam(model.parameters(), weight_decay=0.00002)


In the above code, we are creating a custom Adam optimizer that includes weight decay regularization by adding a weight_decay parameter to the optimizer, and modifying the step() method to include the weight decay term in the update of the parameters. The weight decay term is applied to the gradients by grad = grad.add(p.data, alpha=group[“weight_decay”]) , this will penalize large parameter values by decreasing their update.

Implementing a new optimization algorithm: 

PyTorch provides several built-in optimization algorithms, such as SGD, Adam, and Adagrad. However, there are many other optimization algorithms that are not included in the library. By creating a custom optimizer, we can implement any optimization algorithm that we want.

Python3




class MyOptimizer(torch.optim.Optimizer):
    def __init__(self, params, lr=0.01):
        defaults = dict(lr=lr)
        super(MyOptimizer, self).__init__(params, defaults)
  
    def step(self):
        for group in self.param_groups:
            for p in group['params']:
                if p.grad is None:
                    continue
                p.data = p.data - group['lr']*p.grad.data**2
  
optimizer = MyOptimizer(model.parameters(), lr=0.001)


In this example, we created a new optimization algorithm called MyOptimizer, that performs updates to the parameters based on the squared gradient values, instead of the gradients themselves.

Using multiple optimizers:

 In some cases, we may want to use different optimizers for different parts of the model. For example, we may want to use Adam for the parameters of the convolutional layers, and SGD for the parameters of the fully-connected layers. This can be achieved by creating multiple instances of the optimizer, one for each set of parameters.

Python3




# Define different optimizers for different parts of the model
params1 = model.conv_layers.parameters()
params2 = model.fc_layers.parameters()
  
optimizer1 = torch.optim.Adam(params1)
optimizer2 = torch.optim.SGD(params2, lr=0.01)
  
# In the training loop
for i in range(num_epochs):
    # Perform the training step
    ...
    optimizer1.zero_grad()
    optimizer2.zero_grad()
    loss.backward()
    optimizer1.step()
    optimizer2.step()


In this example, we are using Adam optimizer for the parameters of the convolutional layers, and SGD optimizer with a fixed learning rate of 0.01 for the parameters of the fully-connected layers. This can help fine-tune the training of specific parts of the model.

Custom Optimizers in Pytorch

In PyTorch, an optimizer is a specific implementation of the optimization algorithm that is used to update the parameters of a neural network. The optimizer updates the parameters in such a way that the loss of the neural network is minimized. PyTorch provides various built-in optimizers such as SGD, Adam, Adagrad, etc. that can be used out of the box. However, in some cases, the built-in optimizers may not be suitable for a particular problem or may not perform well. In such cases, one can create their own custom optimizer.

A custom optimizer in PyTorch is a class that inherits from the torch.optim.Optimizer base class. The custom optimizer should implement the init and step methods. The init method is used to initialize the optimizer’s internal state, and the step method is used to update the parameters of the model.

Similar Reads

Creating a Custom Optimizer:

In PyTorch, creating a custom optimizer is a two-step process. First, we need to create a class that inherits from the torch.optim.Optimizer class, and override the following methods:...

Illustration 1:

...

Customizing Optimizers:

Let’s create a simple training loop that shows how to use the custom optimizer to train a model. The loop would perform the following steps:...

Illustration 2:

...

Conclusion:

...

Contact Us