Optimizing Machine Learning Models Using Response Surface Methodology

Optimizing complex processes and Machine Learning models is a critical task. One powerful technique that has gained prominence for this purpose is Response Surface Methodology (RSM). This article delves into the intricacies of RSM, elucidating its principles, applications, and providing practical examples to illustrate its utility.

Table of Content

  • What is Response Surface Methodology (RSM)?
  • Why Use RSM in Machine Learning?
  • Step-by-Step Process of RSM in Machine Learning
  • Implementing Response Surface Methodology
    • Hyperparameter Optimization Using Central Composite Design
    • Analyze response surface
    • Optimization (Gradient Descent – Simplified)
  • Use-Cases and Applications for Response Surface Methodology
  • Advantages and Limitations of Response Surface Methodology

What is Response Surface Methodology (RSM)?

Response Surface Methodology (RSM) is a collection of mathematical and statistical techniques useful for developing, improving, and optimizing processes. Introduced by George E.P. Box and K.B. Wilson in 1951, RSM focuses on the relationships between several explanatory variables and one or more response variables. It is extensively used in engineering, manufacturing, pharmaceuticals, and food sciences to fine-tune processes and enhance product quality.

It is particularly effective when the goal is to find the optimal conditions for a multivariable system. RSM is widely used in various fields, including engineering, manufacturing, and, more recently, machine learning.

Key Concepts of Response Surface Methodology

RSM involves a few fundamental concepts:

  1. Factors: Independent variables that influence the response.
  2. Response: The outcome or dependent variable being measured.
  3. Design of Experiments (DoE): A structured approach to conducting experiments to explore the effects of factors on the response efficiently.
  4. Regression Modeling: Using polynomial equations to approximate the relationship between factors and response.
  5. Optimization: Identifying the factor levels that maximize or minimize the response.

Why Use RSM in Machine Learning?

In machine learning, RSM can be instrumental in hyperparameter tuning, model selection, and performance optimization. Traditional methods like grid search or random search can be computationally expensive and time-consuming. RSM offers a more efficient alternative by systematically exploring the parameter space and building predictive models to identify optimal settings.

1. Efficiency in Hyperparameter Tuning

Hyperparameter tuning is crucial for optimizing the performance of machine learning models. Traditional methods like grid search and random search, while effective, can be computationally expensive and time-consuming.

RSM provides a more systematic and efficient approach. By using a structured design of experiments (DoE), RSM explores the hyperparameter space more intelligently. It builds a predictive model (often a polynomial regression model) that approximates the relationship between hyperparameters and model performance. This allows for a more focused search in regions of the hyperparameter space that are likely to yield better performance, reducing the number of experiments needed.

Design an experiment to systematically vary the factors and observe the response. Common designs include:

  • Full Factorial Design: Examines all possible combinations of factors.
  • Fractional Factorial Design: Examines a subset of possible combinations.
  • Central Composite Design (CCD): Combines factorial points, center points, and axial points to fit a quadratic model.

2. Building Predictive Models

RSM involves fitting a regression model to the results of the experiments. This model, often a second-order polynomial, describes how the response variable (e.g., model accuracy) changes with the hyperparameters. By analyzing this model, one can understand the interactions between hyperparameters and their combined effect on performance. This is particularly useful in machine learning, where hyperparameters often interact in complex ways.

For example, in a neural network, the learning rate and batch size might interact in a non-linear manner. RSM can capture these interactions and provide insights that are not easily obtainable through grid or random search. This predictive model can then be used to identify the optimal combination of hyperparameters more efficiently.

3. Optimization

Once the predictive model is built, RSM uses optimization techniques to find the best combination of hyperparameters. This is typically done by finding the maximum (or minimum) of the response surface. Techniques like gradient descent or evolutionary algorithms can be employed to navigate the response surface and identify the optimal settings.

For instance, in hyperparameter tuning of a Support Vector Machine (SVM), RSM can help identify the optimal values for the regularization parameter and kernel parameters by systematically exploring the parameter space and fitting a response surface model. This approach is more efficient than grid search, which would require evaluating all possible combinations of parameters, or random search, which might miss the optimal region.

Step-by-Step Process of RSM in Machine Learning

Steps Involved in Response Surface Methodology are:

  1. Define the Objective: Maximize the accuracy of a neural network on a given dataset.
  2. Select Factors and Levels: Identify hyperparameters such as learning rate, batch size, and the number of hidden layers.
  3. Design of Experiments (DoE): Use a Central Composite Design (CCD) to explore the parameter space.
  4. Conduct Experiments: Train the neural network with different combinations of hyperparameters as per the CCD and record the accuracy.
  5. Fit a Regression Model: Fit a quadratic regression model to the collected data.
  6. Analyze the Response Surface: Plot the response surface to visualize how changes in hyperparameters affect accuracy.
  7. Optimization: Use optimization techniques to find the combination of hyperparameters that maximize accuracy.

Implementing Response Surface Methodology

Suppose we are tuning a neural network with the following hyperparameters:

  • Learning Rate: 0.001, 0.01, 0.1
  • Batch Size: 16, 32, 64
  • Number of Hidden Layers: 1, 2, 3

Using RSM, we design experiments to systematically vary these hyperparameters and train the neural network. We then fit a quadratic regression model to the results. By analyzing this model, we can understand how each hyperparameter and their interactions affect the accuracy. Finally, we use optimization techniques to find the combination of hyperparameters that maximize accuracy.

Hyperparameter Optimization Using Central Composite Design

The code demonstrates a comprehensive process for optimizing the hyperparameters in following steps:

  • Data Preparation: Load the Iris dataset and split it into training and testing sets.
  • Hyperparameter Ranges: Define the ranges for learning rate, batch size, and the number of hidden layers.
  • Central Composite Design (CCD): Generate combinations of hyperparameters for the experiments.
  • Conduct Experiments: Train the neural network with different combinations of hyperparameters and record the accuracy.
  • Fit a Regression Model: Use statsmodels to fit a quadratic regression model to the collected data.
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
learning_rates = [0.001, 0.01, 0.1]
batch_sizes = [16, 32, 64]
hidden_layers = [1, 2, 3]

# Central Composite Design (CCD) - Simplified for demonstration
experiments = []
for lr in learning_rates:
    for bs in batch_sizes:
        for hl in hidden_layers:
            experiments.append((lr, bs, hl))

# Conduct experiments
results = []
for lr, bs, hl in experiments:
    model = MLPClassifier(hidden_layer_sizes=(hl,), learning_rate_init=lr, batch_size=bs, max_iter=200, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    results.append((lr, bs, hl, accuracy))

df = pd.DataFrame(results, columns=['LearningRate', 'BatchSize', 'HiddenLayers', 'Accuracy'])
# Fit a quadratic regression model
formula = 'Accuracy ~ LearningRate + BatchSize + HiddenLayers + I(LearningRate**2) + I(BatchSize**2) + I(HiddenLayers**2) + LearningRate:BatchSize + LearningRate:HiddenLayers + BatchSize:HiddenLayers'
model = ols(formula, data=df).fit()


                            OLS Regression Results                            
Dep. Variable:               Accuracy   R-squared:                       0.809
Model:                            OLS   Adj. R-squared:                  0.708
Method:                 Least Squares   F-statistic:                     7.989
Date:                Wed, 29 May 2024   Prob (F-statistic):           0.000140
Time:                        09:36:25   Log-Likelihood:                 23.868
No. Observations:                  27   AIC:                            -27.74
Df Residuals:                      17   BIC:                            -14.78
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                                coef    std err          t      P>|t|      [0.025      0.975]
Intercept                     0.6972      0.266      2.625      0.018       0.137       1.258
LearningRate                 26.1088      7.556      3.455      0.003      10.167      42.051
BatchSize                    -0.0063      0.009     -0.698      0.495      -0.025       0.013
HiddenLayers                  0.1191      0.217      0.550      0.589      -0.338       0.576
I(LearningRate ** 2)       -328.8024     70.223     -4.682      0.000    -476.961    -180.644
I(BatchSize ** 2)          3.617e-05      0.000      0.354      0.728      -0.000       0.000
I(HiddenLayers ** 2)         -0.0259      0.051     -0.504      0.621      -0.134       0.083
LearningRate:BatchSize        0.0325      0.027      1.197      0.248      -0.025       0.090
LearningRate:HiddenLayers     2.9613      0.664      4.458      0.000       1.560       4.363
BatchSize:HiddenLayers        0.0004      0.001      0.283      0.780      -0.003       0.004
Omnibus:                        3.585   Durbin-Watson:                   2.874
Prob(Omnibus):                  0.167   Jarque-Bera (JB):                2.268
Skew:                           0.504   Prob(JB):                        0.322
Kurtosis:                       2.000   Cond. No.                     7.12e+06

Analyze response surface

Optimization is the ultimate goal of RSM, aiming to find the best settings of factors that maximize or minimize the response. Techniques include:

  1. Gradient Descent: Iteratively moving towards the optimal point by following the slope of the response surface.
  2. Contour Plots: Visual tools to identify optimal regions by plotting the response against two factors while holding others constant.
  3. 3D Surface Plots: Provide a three-dimensional view of the response surface, aiding in visualizing the effect of factors and their interactions.

Plotting the response surface to visualize the effect of hyperparameters on accuracy.

# Analyze the response surface
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['LearningRate'], df['BatchSize'], df['Accuracy'], c='r', marker='o')
ax.set_xlabel('Learning Rate')
ax.set_ylabel('Batch Size')


Response surface

Optimization (Gradient Descent – Simplified)

Using scipy.optimize.minimize to find the optimal combination of hyperparameters that maximize accuracy.

# Optimization (Gradient Descent - Simplified)
from scipy.optimize import minimize

def objective(params):
    lr, bs, hl = params
    return -model.predict(pd.DataFrame({'LearningRate': [lr], 'BatchSize': [bs], 'HiddenLayers': [hl]}))[0]

initial_guess = [0.01, 32, 2]
result = minimize(objective, initial_guess, bounds=[(0.001, 0.1), (16, 64), (1, 3)], method='L-BFGS-B')
optimal_params = result.x

print(f'Optimal Learning Rate: {optimal_params[0]}')
print(f'Optimal Batch Size: {optimal_params[1]}')
print(f'Optimal Hidden Layers: {optimal_params[2]}')


Optimal Learning Rate: 0.0540041673343001
Optimal Batch Size: 16.0
Optimal Hidden Layers: 3.0

Use-Cases and Applications for Response Surface Methodology

RSM is applied in various fields to improve processes and products:

  1. Engineering: Optimizing machining parameters, material properties, and manufacturing processes.
  2. Pharmaceuticals: Developing new drugs, optimizing formulations, and enhancing production methods.
  3. Food Science: Designing new food products, improving recipes, and optimizing processing conditions.
  4. Chemistry: Enhancing chemical reactions, increasing yields, and improving purity.

Advantages and Limitations of Response Surface Methodology


  • Efficient exploration of factor effects with fewer experiments.
  • Ability to model complex relationships and interactions.
  • Provides clear visualizations and optimization strategies.


  • Assumes a well-behaved response surface, which may not always be true.
  • Requires careful experimental design and analysis to avoid misleading conclusions.
  • May be resource-intensive for a large number of factors and levels.


Response Surface Methodology is a powerful and versatile tool for optimizing processes and improving product quality. By systematically exploring the relationships between multiple factors and a response, RSM helps identify optimal conditions and make informed decisions. Despite its limitations, RSM’s ability to provide deep insights and robust optimization makes it invaluable in research and industry, driving innovation and efficiency.

