Implementing Hyperparameter Tuning in a decision Tree

Install required libraries

pip install bayesian-optimization

Importing Libraries and Loading the Dataset

For the implementation of all methods we will use California housing prices dataset.

Python3
from sklearn.datasets import fetch_california_housing
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error
from bayes_opt import BayesianOptimization

data = fetch_california_housing()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Grid Search

Let’s discover the implementation of how the hyperparameter gets tuned in decision trees with the help of grid search.

  • Defining parameter grid: We defined a dictionary named param_grid, where the keys are hyperparameters of the decision tree classifier such as criterion, max_depth, min_samples_split, and min_samples_leaf. The key’s corresponding list of possible values is tested during the grid search.
  • Creating decision tree classifier instances and performing grid searches: An instance of the DecisionTreeregressor class is created that will be used to fit the data and evaluate the different hyperparameter combinations. The GridSearchCV is used to perform the grid search with cross-validation. The grid search algorithm trains k different models and each time uses k-1 subsets (folds) as training data and the rest of the subsets as validation data. For each combination of hyperparameters that are specified in the parameter grid, the decision tree algorithm fits a model using the training data and evaluates its performance on the validation data.
  • Hyperparameter selection: Once all the models have been trained and evaluated, the grid search selects the combination of hyperparameters that yields the best average performance across all k folds. And that combination of hyperparameters is considered to be an optimal set of hyperparameters for the model.
Python3
# Define the parameter grid to tune the hyperparameters
param_grid = {
    'max_depth': [10, 20, 30, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}
dtree_reg = DecisionTreeRegressor(random_state=42) # Initialize a decision tree regressor
grid_search = GridSearchCV(estimator=dtree_reg, param_grid=param_grid, 
                           cv=5, n_jobs=-1, verbose=2, scoring='neg_mean_squared_error')
grid_search.fit(X_train, y_train)
best_dtree_reg = grid_search.best_estimator_ # Get the best estimator from the grid search
y_pred = best_dtree_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = mse ** 0.5
best_params = grid_search.best_params_
print(f"Best parameters: {best_params}")
print(f"Test RMSE: {rmse}")

Output:

Best parameters: {'max_depth': 10, 'min_samples_leaf': 4, 'min_samples_split': 2}
Test RMSE: 0.6390654005312799

Chosen hyperparameters are shown that produced the best performance according to the cross-validation grid search. The classification report shows the performance metrics of the decision tree classifier on the test using the best parameters found by the grid search before.

Random search

The implementation of how the hyperparameter gets tuned in decision trees with the help of random search is shown below. Since random search is an improved version of grid search, we do a similar implementation task as we did in the grid search. The RandomizedSearchCV performs hyperparameter tuning by randomly searching through the selection combinations of hyperparameters from the specified parameter distributions. Once the RandomizedSearchCV completes its search, the best combination of hyperparameters is found and the predictions are made based on those hyperparameter combinations.

Python3
# Define the parameter distribution to sample from
param_dist = {
    'max_depth': randint(1, 20),
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 20)
}
dtree_reg = DecisionTreeRegressor(random_state=42)
random_search = RandomizedSearchCV(dtree_reg, param_distributions=param_dist, 
                                   n_iter=100, cv=5, random_state=42)
random_search.fit(X_train, y_train)
best_params_random = random_search.best_params_
best_score_random = random_search.best_score_

print(f"Best Parameters (Random Search): {best_params_random}")
print(f"Best Score (Random Search): {best_score_random}")

Output:

Best Parameters (Random Search): {'max_depth': 16, 'min_samples_leaf': 16, 'min_samples_split': 2}
Best Score (Random Search): 0.7301785873565848

To understand difference between Grid search and randomized search, please refer to: Grid search vs randomized search

Bayesian optimization

Let’s now discover the implementation of how the hyperparameter gets tuned in decision trees with the help of Bayesian optimization.

  • Defining the search space: We define a dictionary in the name of param_space we did earlier in the grid and random search where we defined param_grid. The search space holds the range for each hyperparameter.
  • Initialize Bayesian optimization: When we initialize the Bayesian optimization, we’re configuring the optimization framework to explore the hyperparameter space efficiently and that helps to find the best set of hyperparameters for the classifier.
  • Acquiring the best hyperparameters and score: The Bayesian optimizer helps to acquire the best set of hyperparameters and their corresponding best score is obtained by the best_params and best_score.
Python3
# Define the function to optimize using cross-validation
def dtree_cv(max_depth, min_samples_split, min_samples_leaf):
    # Define the model with the parameters to be optimized
    estimator = DecisionTreeRegressor(
        max_depth=int(max_depth),
        min_samples_split=int(min_samples_split),
        min_samples_leaf=int(min_samples_leaf),
        random_state=2
    )
    cval = cross_val_score(estimator, X_train, y_train, scoring='neg_mean_squared_error', cv=5)
    return cval.mean() # The optimizer tries to maximize the function, so we negate the score to minimize it

# Define the parameter bounds
param_bounds = {
    'max_depth': (1, 20),
    'min_samples_split': (2, 20),
    'min_samples_leaf': (1, 20)
}

optimizer = BayesianOptimization(
    f=dtree_cv,
    pbounds=param_bounds,
    random_state=1,
)

optimizer.maximize(n_iter=25, init_points=5) # Bayesian optimization
best_params_bayes = optimizer.max['params']
best_params_bayes['max_depth'] = int(best_params_bayes['max_depth'])
best_params_bayes['min_samples_split'] = int(best_params_bayes['min_samples_split'])
best_params_bayes['min_samples_leaf'] = int(best_params_bayes['min_samples_leaf'])
best_score_bayes = optimizer.max['target']

print(f"Best Parameters (Bayesian Optimization): {best_params_bayes}")
print(f"Best Score (Bayesian Optimization): {best_score_bayes}")

Output:

Best Parameters (Bayesian Optimization): {'max_depth': 18, 'min_samples_leaf': 16, 'min_samples_split': 20}
Best Score (Bayesian Optimization): -0.36047878315909154

As we negated the MSE to convert it into a maximization problem for the optimizer, the reported score is also negated. Therefore, a more negative score actually corresponds to a better performance. In this case, a score of -0.36047878315909154 obtained through Bayesian optimization suggests that the model performs relatively well on the dataset.

How to tune a Decision Tree in Hyperparameter tuning

Decision trees are powerful models extensively used in machine learning for classification and regression tasks. The structure of decision trees resembles the flowchart of decisions helps us to interpret and explain easily. However, the performance of decision trees highly relies on the hyperparameters, selecting the optimal hyperparameter can significantly impact the model’s accuracy, generalization ability, and robustness.

In this article, we will explore the different ways to tune the hyperparameters and their optimization techniques with the help of decision trees.

Table of Content

  • Hyperparameters in Decision Trees
  • Why Tune hyperparameters in Decision Trees?
  • Methods for Hyperparameter Tuning in Decision Tree
  • Implementing Hyperparameter Tuning in a decision Tree

Similar Reads

Hyperparameters in Decision Trees

Decision trees are versatile algorithms used in machine learning that perform classification and regression tasks. They can even handle multi-output tasks for various predictive modeling tasks. A model parameter is an adjustable parameter that is said to be learned from the training data during the model’s training process. In decision trees, there are two types of model parameters such as learnable and non-learnable parameters....

Why Tune hyperparameters in Decision Trees?

While training the machine learning models, the requirement for different sets of hyperparameters arises because of the needs of each dataset and model. One such solution to determine the hyperparameter is to perform multiple experiments that allow us to choose a set of hyperparameters that best suits our model. This process of selecting the optimal hyperparameter is called hyperparameter tuning....

Methods for Hyperparameter Tuning in Decision Tree

To optimize the model’s performance it is important to tune the hyperparameters. There are three most widely used methods available such as grid search, random search, and Bayesian optimization, these searches explore the different combinations of hyperparameter values that help to find the most effective configuration and fine-tune the decision tree model....

Implementing Hyperparameter Tuning in a decision Tree

Install required libraries...

Conclusion

Hyperparameter tuning plays a crucial role in optimizing decision tree models for its enhanced accuracy, generalization, and robustness. We have explored techniques like grid search, random search, and Bayesian optimization that efficiently navigates the hyperparameter space....

Contact Us