Single Estimator Versus Bagging: Bias-Variance Decomposition in Scikit Learn
You can use the Bias-Variance decomposition to assess how well one estimator performs in comparison to the Bagging ensemble approach. We may examine the average predicted loss, average bias, and average variance for both strategies by using the bias_variance_decomp function. The bias-variance trade-off, wherein increasing model complexity decreases bias but increases variance, is represented by the single estimator. By averaging the predictions from several bootstrap samples, bagging minimizes variation and lowers total error. This decomposition helps in choosing the best strategy for a given problem by revealing the model’s bias—its propensity to fit the training data—and variance—its sensitivity to fluctuations.
Single Estimator
A machine learning model known as a single estimator is used to generate predictions on new data after being trained on one set of training data. Decision trees, logistic regression, support vector machines, and linear regression are a few examples of single estimators in Scikit-Learn. Depending on the issue at hand, these models can be applied to both classification and regression tasks.
Bagging
Bagging (short for bootstrap aggregating) is a technique used to enhance the performance of machine learning models by combining the predictions of many models that were each trained on a distinct subset of the training data. The process being:
- To construct numerous equal-sized subsets, replacement is used to randomly sample the training data.
- Each subset is used to train a different model, creating an ensemble of models.
- Each model in the ensemble creates its own prediction when using new data.
- The final prediction is created by averaging the predictions of all models in the ensemble.
Random forests and bagged decision trees are two examples of bagged models in Scikit-Learn.
Bias-Variance Decomposition
Bias:
Bias defines the deviation of model expected or predicted values from the true value of the parameter it is estimating in statistics and machine learning. It can also be stated as the average difference between the predicted and the actual target value
where
- is the estimation or prediction,
- is the expected value of the estimator,
- is the true or population value of the parameter being estimated.
Variance:
Variance is the variability or volatility of model estimations or predictions across different training datasets. It evaluates a model’s sensitivity to certain samples or instances in the training data.
where,
- is the predicted value by model
- is the mean of predicted values or known as expected values
- is the squared difference between the predicted and expected value.
When building a model, it is ideal to pick one with low bias and low variance.
A high variance model would suggest a model that has overfit the training data and is not likely to generalize the future predictions successfully, whereas a high bias model would mean a model is underfitting, i.e. it has not comprehended your data correctly.
Bias-variance Decomposition: The ability of machine learning models to generalize to new data can be affected by either high bias or high variance. Support in identifying these issues is a bias-variance decomposition, which splits a model’s errors into bias and variance.
Decision trees are an example of a single estimator that may have a significant bias or high variation.
Bagging can lower a model’s variance and enhance generalization performance.
Example 1:
Prerequisites: Install mlxtend library for Bias-variance Decomposition
!pip install mlxtend --upgrade
Step 1: Import the necessary packages and Load the datasets
Python3
import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeRegressor from sklearn.ensemble import BaggingRegressor from mlxtend.evaluate import bias_variance_decomp import numpy as np import matplotlib.pyplot as plt # Load the dataset X, y = fetch_california_housing(return_X_y = True ) # using the train test split function X_train, X_test,y_train, y_test = train_test_split(X,y , random_state = 23 , test_size = 0.25 , shuffle = True ) |
For Single Estimator
Step 2: Find the bias & variance using Single Estimator Decision Tree
Python3
# Building a Decision tree model on population data and obtaining predicion on test data decision_tree = DecisionTreeRegressor(criterion = 'absolute_error' , min_samples_leaf = 3 ) decision_tree.fit(X_train, y_train) y_hat_pop_tree = decision_tree.predict(X_test) avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(decision_tree, X_train, y_train, X_test, y_test, num_rounds = 20 , loss = 'mse' , random_seed = 23 ) print ( 'For single Estimator' ) print ( 'Average expected loss: %.3f' % avg_expected_loss) print ( 'Average bias: %.3f' % avg_bias) print ( 'Average variance: %.3f' % avg_var) |
Output:
For single Estimator Average expected loss: 0.527 Average bias: 0.266 Average variance: 0.261
Step 4: Plot the Bias & Variance plot for Single Estimator
Python3
labels = [ 'Expected Loss' , 'Bias^2' , 'Variance' ] values = [avg_expected_loss, avg_bias, avg_var] plt.bar(labels, values) plt.xlabel( 'Terms' ) plt.ylabel( 'Value' ) plt.title( 'Bias-Variance Decomposition for Single Estimator' ) plt.show() |
Output:
For Bagging
Step 5: Find the bias & variance using Bagging
Python3
# Building a Bagging model on population data and obtaining predicion on test data bagging = BaggingRegressor() bagging.fit(X_train, y_train) y_hat_pop_bagging = bagging.predict(X_test) avg_expected_loss, avg_bias, avg_var = bias_variance_decomp(bagging, X_train, y_train, X_test, y_test, num_rounds = 10 , loss = 'mse' , random_seed = 23 ) print ( 'For bagging model' ) print ( 'Average expected loss: %.3f' % avg_expected_loss) print ( 'Average bias: %.3f' % avg_bias) print ( 'Average variance: %.3f' % avg_var) |
Output:
For bagging model Average expected loss: 3303.124 Average bias: 2714.084 Average variance: 589.040
Step 6: Plot the Bias & Variance plot for Bagging
Python3
labels = [ 'Expected Loss' , 'Bias^2' , 'Variance' ] values = [avg_expected_loss, avg_bias, avg_var] plt.bar(labels, values) plt.xlabel( 'Terms' ) plt.ylabel( 'Value' ) plt.title( 'Bias-Variance Decomposition for Bagging' ) plt.show() |
Output:
Example 2: Classifications
Steps:
- Load the necessary packages
- Load the iris datasets
- Split train and test datasets
- Compute bias and variance using a Single Estimator Decision Tree
- Compute bias and variance using Bagging
- Plot the bias-variance decomposition for both
Python3
# Load the necessary packages import numpy as np import matplotlib.pyplot as plt from sklearn.ensemble import BaggingClassifier from sklearn.tree import DecisionTreeClassifier from mlxtend.evaluate import bias_variance_decomp from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split import numpy as np import matplotlib.pyplot as plt # Load the datasets X, y = load_iris(return_X_y = True ) # using the train test split function X_train, X_test,y_train, y_test = train_test_split(X,y, random_state = 104 , test_size = 0.25 , shuffle = True ) # Building a Decision tree model on data and obtaining predicion on test data tree = DecisionTreeClassifier() tree.fit(X_train, y_train) y_hat_pop_tree = tree.predict(X_test) y_error, avg_bias, avg_var = bias_variance_decomp(tree, X_train, y_train, X_test, y_test, loss = '0-1_loss' , random_seed = 23 ) print ( 'Using Single Estimator' ) print ( 'Average expected loss: %.3f' % y_error) print ( 'Average bias: %.3f' % avg_bias) print ( 'Average variance: %.3f' % avg_var) # Building a Bagging model on population data and obtaining predicion on test data bagging = BaggingClassifier() bagging.fit(X_train, y_train) y_hat_pop_bagging = bagging.predict(X_test) by_error, bavg_bias, bavg_var = bias_variance_decomp(bagging, X_train, y_train, X_test, y_test, loss = '0-1_loss' , random_seed = 123 ) print ( 'Using Bagging' ) print ( 'Average expected loss: %.3f' % by_error) print ( 'Average bias: %.3f' % bavg_bias) print ( 'Average variance: %.3f' % bavg_var) # Plotting the Bias-Variance decomposition graph labels = [ 'Expected Loss' , 'Bias^2' , 'Variance' ] tree_values = [y_error, avg_bias, avg_var] bagging_values = [by_error, bavg_bias, bavg_var] plt.figure(figsize = ( 12 , 5 )) plt.subplot( 1 , 2 , 1 ) plt.bar(labels, tree_values) plt.xlabel( 'Terms' ) plt.ylabel( 'Value' ) plt.title( 'Bias-Variance Decomposition (Decision Tree)' ) plt.subplot( 1 , 2 , 2 ) plt.bar(labels, bagging_values) plt.xlabel( 'Terms' ) plt.ylabel( 'Value' ) plt.title( 'Bias-Variance Decomposition (Bagging)' ) plt.tight_layout() plt.show() |
Output:
Using Single Estimator Average expected loss: 0.030 Average bias: 0.026 Average variance: 0.023 Using Bagging Average expected loss: 0.035 Average bias: 0.053 Average variance: 0.020
Contact Us