Customizing Optimizers

There are many ways to customize optimizers in PyTorch, Some of them are as follows:

Changing the learning rate schedule:

The learning rate of the optimizer can be changed during training using a learning rate scheduler. PyTorch provides several built-in schedulers such as torch.optim.lr_scheduler.StepLR and torch.optim.lr_scheduler.ExponentialLR. We can also create our own scheduler by inheriting from the torch.optim.lr_scheduler._LRScheduler class.

In below code, we are using the torch.optim.lr_scheduler.StepLR scheduler which will multiply the learning rate by a factor of gamma every step_size iterations.

Python3

# Initialize an optimizer with a fixed learning rate 
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) 
  
# Create a learning rate scheduler 
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) 
  
num_epochs = 200
# In the training loop 
for i in range(num_epochs): 
    # Perform the training step 
    optimizer.zero_grad() 
      
    y_pred = model(X) 
    loss = criterion(y_pred, y) 
      
    loss.backward() 
    optimizer.step() 
    # Update the learning rate 
    scheduler.step()

Adding regularization

To add regularization to the optimizer, we can modify the step() method to include the regularization term in the update of the model parameters. For example, we can add L1 or L2 regularization by modifying the step() method to include a term that penalizes the absolute or squared values of the parameters respectively.

Python3

# Define custom optimizer 
class MyAdam(torch.optim.Adam): 
    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), weight_decay=0): 
        super().__init__(params, lr=lr, betas=betas) 
        self.weight_decay = weight_decay 
  
    def step(self): 
        for group in self.param_groups: 
            for p in group['params']: 
                if p.grad is None: 
                    continue
                grad = p.grad.data 
                if grad.is_sparse: 
                    raise RuntimeError("Adam does not support sparse gradients") 
  
                state = self.state[p] 
  
                # State initialization 
                if len(state) == 0: 
                    state["step"] = 0
                    # Exponential moving average of gradient values 
                    state["exp_avg"] = torch.zeros_like(p.data) 
                    # Exponential moving average of squared gradient values 
                    state["exp_avg_sq"] = torch.zeros_like(p.data) 
  
                exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"] 
                beta1, beta2 = group["betas"] 
  
                state["step"] += 1
  
                if self.weight_decay != 0: 
                    grad = grad.add(p.data, alpha=self.weight_decay) 
  
                # Decay the first and second moment running average coefficient 
                exp_avg.mul_(beta1).add_(1 - beta1, grad) 
                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad) 
  
                denom = exp_avg_sq.sqrt().add_(group["eps"]) 
  
                bias_correction1 = 1 - beta1 ** state["step"] 
                bias_correction2 = 1 - beta2 ** state["step"] 
                step_size = group["lr"] * math.sqrt(bias_correction2) / bias_correction1 
  
                p.data.addcdiv_(-step_size, exp_avg, denom) 
  
# Optimizer 
optimizer = MyAdam(model.parameters(), weight_decay=0.00002)

In the above code, we are creating a custom Adam optimizer that includes weight decay regularization by adding a weight_decay parameter to the optimizer, and modifying the step() method to include the weight decay term in the update of the parameters. The weight decay term is applied to the gradients by grad = grad.add(p.data, alpha=group[“weight_decay”]) , this will penalize large parameter values by decreasing their update.

Implementing a new optimization algorithm:

PyTorch provides several built-in optimization algorithms, such as SGD, Adam, and Adagrad. However, there are many other optimization algorithms that are not included in the library. By creating a custom optimizer, we can implement any optimization algorithm that we want.

Python3

class MyOptimizer(torch.optim.Optimizer): 
    def __init__(self, params, lr=0.01): 
        defaults = dict(lr=lr) 
        super(MyOptimizer, self).__init__(params, defaults) 
  
    def step(self): 
        for group in self.param_groups: 
            for p in group['params']: 
                if p.grad is None: 
                    continue
                p.data = p.data - group['lr']*p.grad.data**2
  
optimizer = MyOptimizer(model.parameters(), lr=0.001)

In this example, we created a new optimization algorithm called MyOptimizer, that performs updates to the parameters based on the squared gradient values, instead of the gradients themselves.

Using multiple optimizers:

In some cases, we may want to use different optimizers for different parts of the model. For example, we may want to use Adam for the parameters of the convolutional layers, and SGD for the parameters of the fully-connected layers. This can be achieved by creating multiple instances of the optimizer, one for each set of parameters.

Python3

# Define different optimizers for different parts of the model 
params1 = model.conv_layers.parameters() 
params2 = model.fc_layers.parameters() 
  
optimizer1 = torch.optim.Adam(params1) 
optimizer2 = torch.optim.SGD(params2, lr=0.01) 
  
# In the training loop 
for i in range(num_epochs): 
    # Perform the training step 
    ... 
    optimizer1.zero_grad() 
    optimizer2.zero_grad() 
    loss.backward() 
    optimizer1.step() 
    optimizer2.step() 

In this example, we are using Adam optimizer for the parameters of the convolutional layers, and SGD optimizer with a fixed learning rate of 0.01 for the parameters of the fully-connected layers. This can help fine-tune the training of specific parts of the model.

Custom Optimizers in Pytorch

In PyTorch, an optimizer is a specific implementation of the optimization algorithm that is used to update the parameters of a neural network. The optimizer updates the parameters in such a way that the loss of the neural network is minimized. PyTorch provides various built-in optimizers such as SGD, Adam, Adagrad, etc. that can be used out of the box. However, in some cases, the built-in optimizers may not be suitable for a particular problem or may not perform well. In such cases, one can create their own custom optimizer.

A custom optimizer in PyTorch is a class that inherits from the torch.optim.Optimizer base class. The custom optimizer should implement the init and step methods. The init method is used to initialize the optimizer’s internal state, and the step method is used to update the parameters of the model.

Tags:

#Python-PyTorch #Technical Scripter 2022 #AI-ML-DS #Machine Learning #Technical Scripter #Machine Learning

Illustration 1:

Illustration 2:

Customizing Optimizers

Changing the learning rate schedule:

Python3

Adding regularization

Python3

Implementing a new optimization algorithm:

Python3

Using multiple optimizers:

Python3

Custom Optimizers in Pytorch

Similar Reads

Contact Us