CatBoost Regularization Parameters

Choosing the Right Regularization Parameters

CatBoost offers several regularization parameters, each designed to control a specific aspect of model complexity. Let’s explore some of the most commonly used CatBoost regularization parameters:

1. L2 Regularization (reg_lambda)

L2 regularization, also known as ridge regularization, adds a penalty term to the loss function based on the L2 norm of the model’s weights. This discourages the model from assigning too much importance to any one feature. The reg_lambda parameter controls the strength of this regularization. Higher values lead to stronger regularization.

Here, L₀(θ) is the original loss function, λ is the regularization strength, and ∥θ∥₂² is the squared L2 norm of the model parameters.

2. L1 Regularization (reg_alpha)

L1 regularization, also known as lasso regularization, adds a penalty term based on the L1 norm of the model’s weights. It encourages feature selection by pushing some weights to exactly zero. The reg_alpha parameter controls the strength of this regularization.

Here, L₀(θ) is the original loss function, λ is the regularization strength, and ∥θ∥₂² is the squared L2 norm of the model parameters.

3. Max Depth (max_depth)

The max_depth parameter controls the maximum depth of trees in the CatBoost ensemble. Limiting tree depth is a form of regularization as it prevents the model from creating overly complex trees that can fit noise in the data.

Here, T(x) is the tree, d is depth, R_k represents the regions defined by the decision nodes, and f_k are the values associated with each region.

4. Min Child Samples (min_child_samples)

This parameter sets the minimum number of samples required to split a node. Increasing min_child_samples can prevent the model from overfitting by ensuring that a node must have a minimum amount of data to be split.

Here, n_min is the specified minimum number of samples.

5. Colsample Bylevel (colsample_bylevel) and Colsample Bytree (colsample_bytree)

These parameters control the fraction of features to consider when building each level of a tree (colsample_bylevel) and each tree in the ensemble (colsample_bytree). Reducing these values can add regularization by making the model less sensitive to individual features.

The colsample_bylevel parameter controls the fraction of features to be randomly chosen for each level in every tree.
The colsample_bytree parameter controls the fraction of features to be randomly chosen for each tree.

6. rsm (Random Selection Rate)

It specifies the fraction of features to be randomly chosen for each tree. Introducing randomness in feature selection is a form of regularization. It prevents the model from relying too heavily on specific features, enhancing generalization by making the model more robust to variations in the dataset.

The random selection rate, denoted by p, specifies the fraction of features to be randomly chosen for each tree.

Here, m is the total number of features, and m’ is the number of features randomly selected for a particular tree.

7. leaf_estimation_method

This parameter determines the method used to calculate values in leaves. Setting it to ‘Newton’ enables the use of Newton-Raphson’s method for leaf value calculation, which can provide better generalization and regularization.

Here, f_k is the leaf value, g_k is the first-order gradient, H_k is the second-order Hessian, and λ is the regularization term.