Implementing Response Surface Methodology

Step-by-Step Process of RSM in Machine Learning

Use-Cases and Applications for Response Surface Methodology

Suppose we are tuning a neural network with the following hyperparameters:

Learning Rate: 0.001, 0.01, 0.1
Batch Size: 16, 32, 64
Number of Hidden Layers: 1, 2, 3

Using RSM, we design experiments to systematically vary these hyperparameters and train the neural network. We then fit a quadratic regression model to the results. By analyzing this model, we can understand how each hyperparameter and their interactions affect the accuracy. Finally, we use optimization techniques to find the combination of hyperparameters that maximize accuracy.

Hyperparameter Optimization Using Central Composite Design

The code demonstrates a comprehensive process for optimizing the hyperparameters in following steps:

Data Preparation: Load the Iris dataset and split it into training and testing sets.
Hyperparameter Ranges: Define the ranges for learning rate, batch size, and the number of hidden layers.
Central Composite Design (CCD): Generate combinations of hyperparameters for the experiments.
Conduct Experiments: Train the neural network with different combinations of hyperparameters and record the accuracy.
Fit a Regression Model: Use statsmodels to fit a quadratic regression model to the collected data.

Python

import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
learning_rates = [0.001, 0.01, 0.1]
batch_sizes = [16, 32, 64]
hidden_layers = [1, 2, 3]

# Central Composite Design (CCD) - Simplified for demonstration
experiments = []
for lr in learning_rates:
    for bs in batch_sizes:
        for hl in hidden_layers:
            experiments.append((lr, bs, hl))

# Conduct experiments
results = []
for lr, bs, hl in experiments:
    model = MLPClassifier(hidden_layer_sizes=(hl,), learning_rate_init=lr, batch_size=bs, max_iter=200, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    results.append((lr, bs, hl, accuracy))

df = pd.DataFrame(results, columns=['LearningRate', 'BatchSize', 'HiddenLayers', 'Accuracy'])
# Fit a quadratic regression model
formula = 'Accuracy ~ LearningRate + BatchSize + HiddenLayers + I(LearningRate**2) + I(BatchSize**2) + I(HiddenLayers**2) + LearningRate:BatchSize + LearningRate:HiddenLayers + BatchSize:HiddenLayers'
model = ols(formula, data=df).fit()
print(model.summary())

Output:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               Accuracy   R-squared:                       0.809
Model:                            OLS   Adj. R-squared:                  0.708
Method:                 Least Squares   F-statistic:                     7.989
Date:                Wed, 29 May 2024   Prob (F-statistic):           0.000140
Time:                        09:36:25   Log-Likelihood:                 23.868
No. Observations:                  27   AIC:                            -27.74
Df Residuals:                      17   BIC:                            -14.78
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
=============================================================================================
                                coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------
Intercept                     0.6972      0.266      2.625      0.018       0.137       1.258
LearningRate                 26.1088      7.556      3.455      0.003      10.167      42.051
BatchSize                    -0.0063      0.009     -0.698      0.495      -0.025       0.013
HiddenLayers                  0.1191      0.217      0.550      0.589      -0.338       0.576
I(LearningRate ** 2)       -328.8024     70.223     -4.682      0.000    -476.961    -180.644
I(BatchSize ** 2)          3.617e-05      0.000      0.354      0.728      -0.000       0.000
I(HiddenLayers ** 2)         -0.0259      0.051     -0.504      0.621      -0.134       0.083
LearningRate:BatchSize        0.0325      0.027      1.197      0.248      -0.025       0.090
LearningRate:HiddenLayers     2.9613      0.664      4.458      0.000       1.560       4.363
BatchSize:HiddenLayers        0.0004      0.001      0.283      0.780      -0.003       0.004
==============================================================================
Omnibus:                        3.585   Durbin-Watson:                   2.874
Prob(Omnibus):                  0.167   Jarque-Bera (JB):                2.268
Skew:                           0.504   Prob(JB):                        0.322
Kurtosis:                       2.000   Cond. No.                     7.12e+06
==============================================================================

Analyze response surface

Optimization is the ultimate goal of RSM, aiming to find the best settings of factors that maximize or minimize the response. Techniques include:

Gradient Descent: Iteratively moving towards the optimal point by following the slope of the response surface.
Contour Plots: Visual tools to identify optimal regions by plotting the response against two factors while holding others constant.
3D Surface Plots: Provide a three-dimensional view of the response surface, aiding in visualizing the effect of factors and their interactions.

Plotting the response surface to visualize the effect of hyperparameters on accuracy.

Python

# Analyze the response surface
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['LearningRate'], df['BatchSize'], df['Accuracy'], c='r', marker='o')
ax.set_xlabel('Learning Rate')
ax.set_ylabel('Batch Size')
ax.set_zlabel('Accuracy')
plt.show()

Output:

Response surface

Optimization (Gradient Descent – Simplified)

Using scipy.optimize.minimize to find the optimal combination of hyperparameters that maximize accuracy.

Python

# Optimization (Gradient Descent - Simplified)
from scipy.optimize import minimize

def objective(params):
    lr, bs, hl = params
    return -model.predict(pd.DataFrame({'LearningRate': [lr], 'BatchSize': [bs], 'HiddenLayers': [hl]}))[0]

initial_guess = [0.01, 32, 2]
result = minimize(objective, initial_guess, bounds=[(0.001, 0.1), (16, 64), (1, 3)], method='L-BFGS-B')
optimal_params = result.x

print(f'Optimal Learning Rate: {optimal_params[0]}')
print(f'Optimal Batch Size: {optimal_params[1]}')
print(f'Optimal Hidden Layers: {optimal_params[2]}')

Output:

Optimal Learning Rate: 0.0540041673343001
Optimal Batch Size: 16.0
Optimal Hidden Layers: 3.0

Optimizing Machine Learning Models Using Response Surface Methodology

Optimizing complex processes and Machine Learning models is a critical task. One powerful technique that has gained prominence for this purpose is Response Surface Methodology (RSM). This article delves into the intricacies of RSM, elucidating its principles, applications, and providing practical examples to illustrate its utility.

Table of Content

What is Response Surface Methodology (RSM)?
Why Use RSM in Machine Learning?
Step-by-Step Process of RSM in Machine Learning
Implementing Response Surface Methodology

Hyperparameter Optimization Using Central Composite Design
Analyze response surface
Optimization (Gradient Descent – Simplified)

Use-Cases and Applications for Response Surface Methodology
Advantages and Limitations of Response Surface Methodology

Implementing Response Surface Methodology

Hyperparameter Optimization Using Central Composite Design

Analyze response surface

Optimization (Gradient Descent – Simplified)

Optimizing Machine Learning Models Using Response Surface Methodology

Similar Reads

Contact Us