Demonstrating PyTorch Learning Rate Scheduling

Colab link: Learning rate scheduler

Importing libraries

Python3




import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader, TensorDataset
from sklearn.preprocessing import StandardScaler


Loading dataset

You can download the dataset from here.

Python3




df = pd.read_csv("breast-cancer.csv")
df.head()


Output:

         id diagnosis  radius_mean  texture_mean  perimeter_mean  area_mean  \
0 842302 M 17.99 10.38 122.80 1001.0
1 842517 M 20.57 17.77 132.90 1326.0
2 84300903 M 19.69 21.25 130.00 1203.0
3 84348301 M 11.42 20.38 77.58 386.1
4 84358402 M 20.29 14.34 135.10 1297.0
smoothness_mean compactness_mean concavity_mean concave points_mean \
0 0.11840 0.27760 0.3001 0.14710
1 0.08474 0.07864 0.0869 0.07017
2 0.10960 0.15990 0.1974 0.12790
3 0.14250 0.28390 0.2414 0.10520
4 0.10030 0.13280 0.1980 0.10430
... radius_worst texture_worst perimeter_worst area_worst \
0 ... 25.38 17.33 184.60 2019.0
1 ... 24.99 23.41 158.80 1956.0
2 ... 23.57 25.53 152.50 1709.0
3 ... 14.91 26.50 98.87 567.7
4 ... 22.54 16.67 152.20 1575.0
smoothness_worst compactness_worst concavity_worst concave points_worst \
0 0.1622 0.6656 0.7119 0.2654
1 0.1238 0.1866 0.2416 0.1860
2 0.1444 0.4245 0.4504 0.2430
3 0.2098 0.8663 0.6869 0.2575
4 0.1374 0.2050 0.4000 0.1625
symmetry_worst fractal_dimension_worst
0 0.4601 0.11890
1 0.2750 0.08902
2 0.3613 0.08758
3 0.6638 0.17300
4 0.2364 0.07678
[5 rows x 32 columns]

Data extraction and encoding

  • X is a DataFrame containing features, excluding the “diagnosis” and “id” columns from the original DataFrame df.
  • y is a Series containing the target variable “diagnosis” from the original DataFrame df.
  • The values in the “diagnosis” column of y are mapped to numerical values: ‘M’ (Malignant) is mapped to 1, and ‘B’ (Benign) is mapped to 0.
  • X represents the features, while y represents the target variable.

Python3




X = df.drop(["diagnosis", "id"],axis=1)
y= df['diagnosis']
y = y.map({'M':1, 'B':0})


Train test split and stadardisation

  • The train_test_split function from scikit-learn is used to split the dataset (X and y) into training and testing sets.
  • X_train and X_test are the training and testing sets of features, respectively.
  • Y_train and Y_test are the corresponding training and testing sets of target labels.
  • A StandardScaler instance is created, which is a preprocessing step to standardize the features.
  • X_train_std is obtained by fitting the scaler on X_train and then transforming it. This ensures that the training data has a mean of 0 and a standard deviation of 1.
  • X_test_std is standardized using the parameters learned from the training data (X_train), ensuring consistency in the scaling process.
  • random_state=2 is set for reproducibility. This ensures that if you run the code multiple times, you get the same train-test split.

Python3




X_train, X_test, Y_train, Y_test = train_test_split(X, y, test_size=0.2, random_state=2)
scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)
X_test_std = scaler.transform(X_test)


Tensor dataset and Dataloader

  • The NumPy arrays X_train_std and Y_train are converted to PyTorch tensors using torch.FloatTensor.
  • Y_train_tensor is reshaped using .view(-1, 1) to ensure it has a proper shape for model compatibility. The -1 is used to automatically infer the size based on the length of the array, and 1 indicates a single column.
  • Similarly, the test set features (X_test_std) and target labels (Y_test) are converted to PyTorch tensors using torch.FloatTensor. The target tensor is also reshaped.
  • A TensorDataset is created for the training data, combining the features (X_train_std_tensor) and targets (Y_train_tensor) into a single dataset.
  • DataLoader is then used to create an iterator over the dataset with a specified batch size of 32 and shuffling the data (shuffle=True).

Python3




X_train_std_tensor = torch.FloatTensor(X_train_std)
Y_train_tensor = torch.FloatTensor(Y_train.values).view(-1, 1)
 
X_test_std_tensor = torch.FloatTensor(X_test_std)
Y_test_tensor = torch.FloatTensor(Y_test.values).view(-1, 1)
 
train_dataset = TensorDataset(X_train_std_tensor, Y_train_tensor)
train_loader = DataLoader(dataset=train_dataset, batch_size=32, shuffle=True)


Model creation

  • Input Layer: 30 features.
  • Hidden Layers: Two hidden layers with 64 and 32 units, respectively.
  • Activation Functions: ReLU after each hidden layer, Sigmoid at the output.
  • Output Layer: Single unit for binary classification.

Python3




model = nn.Sequential(
    nn.Linear(30, 64),  # Input layer with 30 features, hidden layer with 64 units
    nn.ReLU(),
    nn.Linear(64, 32),  # Hidden layer with 32 units
    nn.ReLU(),
    nn.Linear(32, 1),   # Output layer with 1 unit (for binary classification)
    nn.Sigmoid()
)


Loss function and optimizer

  • criterion = nn.BCELoss(): Binary Cross Entropy Loss is chosen as the loss function, suitable for binary classification tasks.
  • optimizer = optim.Adam(model.parameters(), lr=0.001): Adam optimizer is used for gradient-based optimization with a learning rate of 0.001.

Python3




criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


Learning Rate Scheduler

  • Learning rate is adjusted using StepLR scheduler, reducing it by a factor of 0.5 every 20 epochs.

Python3




scheduler = lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
 
num_epochs = 50


Training Loop

  • for epoch in range(num_epochs):: Iterating through a specified number of epochs (50 in this case).
  • model.train(): Sets the model to training mode.
  • Loop over batches from train_loader.
  • outputs = model(inputs): Forward pass to obtain model predictions.
  • targets = targets.unsqueeze(1).float(): Adjusting the shape of target tensor.
  • loss = criterion(outputs, targets.view(-1, 1)): Calculating the binary cross-entropy loss.
  • Backward pass, gradient update, and learning rate adjustment.

Python3




# Training loop
for epoch in range(num_epochs):
    model.train()
 
    for inputs, targets in train_loader:
        outputs = model(inputs)
        targets = targets.unsqueeze(1).float()  # Fix the shape of the targets
        loss = criterion(outputs, targets.view(-1, 1))
 
 
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
 
    # Adjust learning rate
    scheduler.step()
 
    # Print loss for monitoring
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}')


Output:

Epoch [1/50], Loss: 0.5196633338928223
Epoch [2/50], Loss: 0.29342177510261536
Epoch [3/50], Loss: 0.19762122631072998
Epoch [4/50], Loss: 0.19884507358074188
Epoch [5/50], Loss: 0.028389474377036095
Epoch [6/50], Loss: 0.007852290757000446
Epoch [7/50], Loss: 0.040723469108343124
Epoch [8/50], Loss: 0.04233770817518234
Epoch [9/50], Loss: 0.2953278720378876
Epoch [10/50], Loss: 0.020912442356348038

Evaluation metrics

  • model.eval(): Sets the model to evaluation mode.
  • with torch.no_grad():: Temporarily disables gradient computation during evaluation.
  • test_outputs = model(X_test_std_tensor): Forward pass on the test set.
  • test_predictions = (test_outputs >= 0.5).float(): Converting model probabilities to binary predictions using a threshold of 0.5.
  • accuracy = (test_predictions == Y_test_tensor).float().mean().item(): Calculating accuracy based on binary predictions and true labels.

Python3




model.eval()
with torch.no_grad():
    test_outputs = model(X_test_std_tensor)
    test_predictions = (test_outputs >= 0.5).float()  # Convert probabilities to binary predictions
 
    # Evaluation metrics (you can use appropriate metrics based on your problem)
    accuracy = (test_predictions == Y_test_tensor).float().mean().item()
    print(f'Test Accuracy: {accuracy}')


Output:

Test Accuracy: 0.9561403393745422

The provided test accuracy of approximately 95.6% suggests that the trained neural network model performs well on the test set.

Understanding PyTorch Learning Rate Scheduling

In the realm of deep learning, PyTorch stands as a beacon, illuminating the path for researchers and practitioners to traverse the complex landscapes of artificial intelligence. Its dynamic computational graph and user-friendly interface have solidified its position as a preferred framework for developing neural networks. As we delve into the nuances of model training, one essential aspect that demands meticulous attention is the learning rate. To navigate the fluctuating terrains of optimization effectively, PyTorch introduces a potent ally—the learning rate scheduler. This article aims to demystify the PyTorch learning rate scheduler, providing insights into its syntax, parameters, and indispensable role in enhancing the efficiency and efficacy of model training.

Similar Reads

PyTorch Learning Rate Scheduler

PyTorch, an open-source machine learning library, has gained immense popularity for its dynamic computation graph and ease of use. Developed by Facebook’s AI Research lab (FAIR), PyTorch has become a go-to framework for building and training deep learning models. Its flexibility and dynamic nature make it particularly well-suited for research and experimentation, allowing practitioners to iterate swiftly and explore innovative approaches in the ever-evolving field of artificial intelligence....

Demonstrating PyTorch Learning Rate Scheduling

Colab link: Learning rate scheduler...

Applications of PyTorch learning rate schedulers

...

Contact Us