Implementing Hyperparameter Tuning in a decision Tree

Methods for Hyperparameter Tuning in Decision Tree

Install required libraries

pip install bayesian-optimization

Importing Libraries and Loading the Dataset

For the implementation of all methods we will use California housing prices dataset.

Python3

from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error
from bayes_opt import BayesianOptimization

data = fetch_california_housing()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Grid Search

Let’s discover the implementation of how the hyperparameter gets tuned in decision trees with the help of grid search.

Defining parameter grid: We defined a dictionary named param_grid, where the keys are hyperparameters of the decision tree classifier such as criterion, max_depth, min_samples_split, and min_samples_leaf. The key’s corresponding list of possible values is tested during the grid search.
Creating decision tree classifier instances and performing grid searches: An instance of the DecisionTreeregressor class is created that will be used to fit the data and evaluate the different hyperparameter combinations. The GridSearchCV is used to perform the grid search with cross-validation. The grid search algorithm trains k different models and each time uses k-1 subsets (folds) as training data and the rest of the subsets as validation data. For each combination of hyperparameters that are specified in the parameter grid, the decision tree algorithm fits a model using the training data and evaluates its performance on the validation data.
Hyperparameter selection: Once all the models have been trained and evaluated, the grid search selects the combination of hyperparameters that yields the best average performance across all k folds. And that combination of hyperparameters is considered to be an optimal set of hyperparameters for the model.

Python3

# Define the parameter grid to tune the hyperparameters
param_grid = {
    'max_depth': [10, 20, 30, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}
dtree_reg = DecisionTreeRegressor(random_state=42) # Initialize a decision tree regressor
grid_search = GridSearchCV(estimator=dtree_reg, param_grid=param_grid, 
                           cv=5, n_jobs=-1, verbose=2, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
best_dtree_reg = grid_search.best_estimator_ # Get the best estimator from the grid search
y_pred = best_dtree_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = mse ** 0.5
best_params = grid_search.best_params_
print(f"Best parameters: {best_params}")
print(f"Test RMSE: {rmse}")

Output:

Best parameters: {'max_depth': 10, 'min_samples_leaf': 4, 'min_samples_split': 2}
Test RMSE: 0.6390654005312799

Chosen hyperparameters are shown that produced the best performance according to the cross-validation grid search. The classification report shows the performance metrics of the decision tree classifier on the test using the best parameters found by the grid search before.

Random search

The implementation of how the hyperparameter gets tuned in decision trees with the help of random search is shown below. Since random search is an improved version of grid search, we do a similar implementation task as we did in the grid search. The RandomizedSearchCV performs hyperparameter tuning by randomly searching through the selection combinations of hyperparameters from the specified parameter distributions. Once the RandomizedSearchCV completes its search, the best combination of hyperparameters is found and the predictions are made based on those hyperparameter combinations.

Python3

# Define the parameter distribution to sample from
param_dist = {
    'max_depth': randint(1, 20),
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 20)
}
dtree_reg = DecisionTreeRegressor(random_state=42)
random_search = RandomizedSearchCV(dtree_reg, param_distributions=param_dist, 
                                   n_iter=100, cv=5, random_state=42)
random_search.fit(X_train, y_train)
best_params_random = random_search.best_params_
best_score_random = random_search.best_score_

print(f"Best Parameters (Random Search): {best_params_random}")
print(f"Best Score (Random Search): {best_score_random}")

Output:

Best Parameters (Random Search): {'max_depth': 16, 'min_samples_leaf': 16, 'min_samples_split': 2}
Best Score (Random Search): 0.7301785873565848

To understand difference between Grid search and randomized search, please refer to: Grid search vs randomized search

Bayesian optimization

Let’s now discover the implementation of how the hyperparameter gets tuned in decision trees with the help of Bayesian optimization.

Defining the search space: We define a dictionary in the name of param_space we did earlier in the grid and random search where we defined param_grid. The search space holds the range for each hyperparameter.
Initialize Bayesian optimization: When we initialize the Bayesian optimization, we’re configuring the optimization framework to explore the hyperparameter space efficiently and that helps to find the best set of hyperparameters for the classifier.
Acquiring the best hyperparameters and score: The Bayesian optimizer helps to acquire the best set of hyperparameters and their corresponding best score is obtained by the best_params and best_score.

Python3

# Define the function to optimize using cross-validation
def dtree_cv(max_depth, min_samples_split, min_samples_leaf):
    # Define the model with the parameters to be optimized
    estimator = DecisionTreeRegressor(
        max_depth=int(max_depth),
        min_samples_split=int(min_samples_split),
        min_samples_leaf=int(min_samples_leaf),
        random_state=2
    )
    cval = cross_val_score(estimator, X_train, y_train, scoring='neg_mean_squared_error', cv=5)
    return cval.mean() # The optimizer tries to maximize the function, so we negate the score to minimize it

# Define the parameter bounds
param_bounds = {
    'max_depth': (1, 20),
    'min_samples_split': (2, 20),
    'min_samples_leaf': (1, 20)
}

optimizer = BayesianOptimization(
    f=dtree_cv,
    pbounds=param_bounds,
    random_state=1,
)

optimizer.maximize(n_iter=25, init_points=5) # Bayesian optimization
best_params_bayes = optimizer.max['params']
best_params_bayes['max_depth'] = int(best_params_bayes['max_depth'])
best_params_bayes['min_samples_split'] = int(best_params_bayes['min_samples_split'])
best_params_bayes['min_samples_leaf'] = int(best_params_bayes['min_samples_leaf'])
best_score_bayes = optimizer.max['target']

print(f"Best Parameters (Bayesian Optimization): {best_params_bayes}")
print(f"Best Score (Bayesian Optimization): {best_score_bayes}")

Output:

Best Parameters (Bayesian Optimization): {'max_depth': 18, 'min_samples_leaf': 16, 'min_samples_split': 20}
Best Score (Bayesian Optimization): -0.36047878315909154

As we negated the MSE to convert it into a maximization problem for the optimizer, the reported score is also negated. Therefore, a more negative score actually corresponds to a better performance. In this case, a score of -0.36047878315909154 obtained through Bayesian optimization suggests that the model performs relatively well on the dataset.

How to tune a Decision Tree in Hyperparameter tuning

Decision trees are powerful models extensively used in machine learning for classification and regression tasks. The structure of decision trees resembles the flowchart of decisions helps us to interpret and explain easily. However, the performance of decision trees highly relies on the hyperparameters, selecting the optimal hyperparameter can significantly impact the model’s accuracy, generalization ability, and robustness.

In this article, we will explore the different ways to tune the hyperparameters and their optimization techniques with the help of decision trees.

Table of Content