How to Mitigate Overfitting by Creating Ensembles
A typical problem in machine learning is called overfitting, which occurs when a model learns the training data too well and performs badly on fresh, untried data. Using ensembles is a useful tactic to reduce overfitting. Ensembles increase robustness and generalization by combining predictions from many models. This tutorial looks at setting up ensembles in Scikit-Learn to deal with overfitting.
What is overfitting?
When a machine learning model learns the training data too well, it becomes overfitted and captures noise and unimportant patterns that do not transfer to fresh, unobserved data. Because the model is unable to generalize outside of the training set, this may result in worse performance on fresh datasets.
Why should we Mitigate Overfitting
Overfitting is a key issue in machine learning models because it has a negative influence on the model’s capacity to generalize to new data. Overfitting mitigation is crucial for a number of reasons:
- Generalization: A machine learning model’s ability to generalize successfully to new, unobserved data is its main objective. A model performs poorly on fresh data when it overfits, which happens when it learns the noise in the training set instead of the underlying pattern.
- Performance: On training data, overfitted models may perform exceptionally well, but not so well on fresh data. Predictions made by the model become more accurate and dependable when overfitting is reduced, since it enhances its performance on unknown data.
- Robustness: Models that are prone to overfitting are sensitive to small changes in the training data, which can lead to unstable predictions. By mitigating overfitting, the model becomes more robust to variations in the data.
- Interpretability: Overfit models are more complicated, making them difficult to comprehend. Overfitting may be reduced and the model simplified, making it easier to analyze and interpret the underlying patterns in the data.
What are Ensembles?
Ensembles are machine learning technique where the predictions from various predictors, such as classifiers or regressors, are combined by aggregating the predictions of a set of models to produce outcomes that are superior to those of any individual predictor. An ensemble is a collection of forecasters whose combined forecasts enhance performance. and the term “ensemble method” refers to the general methodology used in this ensemble learning. Bringing together a number of weak learners to become strong learners is the fundamental idea behind ensemble learning.
Types of Ensembles
- Bootstrap Aggregating Bagging :Using various subsets of the training data acquired by bootstrapping, several instances of the same base model are trained in the process of bagging (random sampling with replacement). Usually, a vote or average of the various model forecasts results in the final prediction.
- Boosting: Boosting is the process of training weak models one after the other and assigning more weight to cases that are misclassified. The final model emphasizes accurate classification of previously misclassified cases and is a weighted sum of the weak models.
- Stacking: In stacking, many different models are trained, and then a meta-model is used to combine the predictions of these models. By learning to balance the basis models’ predictions, the meta-model produces an optimal ensemble.
- Dropout: During training, random neurons are removed from the model using a regularization approach called dropout, which keeps the model from becoming too dependent on certain characteristics or combinations. A stronger, more universal model is produced with the aid of dropout.
- Voting: Voting generates a final judgment by aggregating predictions from several models. There are two types of voting: soft voting takes weighted average probability into account, while hard voting uses a majority vote.
- Ensemble of Diverse Models: Diverse base models are advantageous to ensembles. An ensemble may be made more adaptable by experimenting with alternative methods or hyperparameters.
Stepwise Guide of How to apply different Ensemble Methods
Importing neccesary libraries
Python3
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier, VotingClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.datasets import load_iris from sklearn.metrics import accuracy_score from sklearn.ensemble import StackingClassifier from keras.models import Sequential from keras.layers import Dense, Dropout from sklearn.ensemble import BaggingClassifier from xgboost import XGBClassifier |
Loading and Splitting the dataset
Python3
# Load dataset data = load_iris() X, y = data.data, data.target # Split the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 ) |
Implementing Various Ensemble Methods
- Bagging with Random Forests: Uses
BaggingClassifier
withRandomForestClassifier
as the base estimator to train an ensemble of decision trees. - Boosting algorithms: Includes
AdaBoostClassifier
withDecisionTreeClassifier
as the base estimator,GradientBoostingClassifier
, andXGBClassifier
(XGBoost). - Stacking: Uses
StackingClassifier
to combine predictions fromRandomForestClassifier
,SVC
, andLogisticRegression
usingLogisticRegression
as the final estimator. - Dropout: Implements a neural network model using
Sequential
from Keras with dropout layers to prevent overfitting. - Voting: Combines predictions from
RandomForestClassifier
,SVC
, andLogisticRegression
using hard voting. - Ensemble of Diverse Models: Includes an ensemble of a
SVC
and aDecisionTreeClassifier
.
Python3
# Bagging with Random Forests bagging_model = BaggingClassifier(base_estimator = RandomForestClassifier(), n_estimators = 10 ) bagging_model.fit(X_train, y_train) bagging_predictions = bagging_model.predict(X_test) # Boosting algorithms adaboost_model = AdaBoostClassifier(base_estimator = DecisionTreeClassifier(), n_estimators = 50 ) adaboost_model.fit(X_train, y_train) adaboost_predictions = adaboost_model.predict(X_test) gradient_boost_model = GradientBoostingClassifier(n_estimators = 100 , learning_rate = 1.0 , max_depth = 1 , random_state = 42 ) gradient_boost_model.fit(X_train, y_train) gradient_boost_predictions = gradient_boost_model.predict(X_test) xgboost_model = XGBClassifier(n_estimators = 100 , learning_rate = 0.1 , max_depth = 3 , random_state = 42 ) xgboost_model.fit(X_train, y_train) xgboost_predictions = xgboost_model.predict(X_test) # Stacking base_models = [( 'rf' , RandomForestClassifier()), ( 'svc' , SVC()), ( 'lr' , LogisticRegression())] stacking_model = StackingClassifier(estimators = base_models, final_estimator = LogisticRegression()) stacking_model.fit(X_train, y_train) stacking_predictions = stacking_model.predict(X_test) # Dropout dropout_model = Sequential([ Dense( 128 , input_dim = 4 , activation = 'relu' ), Dropout( 0.5 ), Dense( 64 , activation = 'relu' ), Dense( 3 , activation = 'softmax' ) ]) dropout_model. compile (optimizer = 'adam' , loss = 'sparse_categorical_crossentropy' , metrics = [ 'accuracy' ]) dropout_model.fit(X_train, y_train, epochs = 50 , batch_size = 32 , verbose = 0 ) _, dropout_accuracy = dropout_model.evaluate(X_test, y_test) # Voting voting_model = VotingClassifier(estimators = base_models, voting = 'hard' ) voting_model.fit(X_train, y_train) voting_predictions = voting_model.predict(X_test) # Ensemble of Diverse Models svm_model = SVC() svm_model.fit(X_train, y_train) svm_predictions = svm_model.predict(X_test) dt_model = DecisionTreeClassifier() dt_model.fit(X_train, y_train) dt_predictions = dt_model.predict(X_test) |
Comparing Accuracy
Python3
# Evaluate models print ( "Bagging Accuracy:" , accuracy_score(y_test, bagging_predictions)) print ( "AdaBoost Accuracy:" , accuracy_score(y_test, adaboost_predictions)) print ( "Gradient Boosting Accuracy:" , accuracy_score(y_test, gradient_boost_predictions)) print ( "XGBoost Accuracy:" , accuracy_score(y_test, xgboost_predictions)) print ( "Stacking Accuracy:" , accuracy_score(y_test, stacking_predictions)) print ( "Dropout Accuracy:" , dropout_accuracy) print ( "Voting Accuracy:" , accuracy_score(y_test, voting_predictions)) |
Output:
Bagging Accuracy: 1.0
AdaBoost Accuracy: 1.0
Gradient Boosting Accuracy: 0.9666666666666667
XGBoost Accuracy: 1.0
Stacking Accuracy: 1.0
Dropout Accuracy: 0.9666666388511658
Voting Accuracy: 1.0
When to Use Which Ensemble Method?
Depending on the nature of the issue, the properties of the data, and the computer resources available, the best ensemble approach will be chosen. Determining the optimal group strategy for a given task requires experimentation and cross-validation.
Ensemble Method |
When to use? |
---|---|
Bagging |
Works well when the basic model (like Random Forests) is complicated and prone to overfitting. In cases with large volatility, it performs well. |
Boosting |
Beneficial when there is space for development and the basic model is poor. Boosting can handle high-dimensional data effectively and is helpful in eliminating bias. |
Stacking |
Stacking works well when different models can provide original insights. When there is sufficient data to train many models, it works well. |
Dropout |
An effective way to stop overfitting in neural networks. Deep learning situations often employ it. |
Voting |
A quick and easy way to combine different models. When majority votes are trusted, hard voting is appropriate. |
Ensemble of Diverse Models |
Suggested for mixing models with various advantages and disadvantages. When working with intricate and diverse datasets, it is helpful. |
Conclusion:
Overfitting may be reduced by assembling machine learning models into ensembles, for example, by integrating gradient boosting, random forests, and decision trees. By using the advantages of each individual model, the ensemble technique improves resilience and generalization. This tutorial offers a detailed implementation that makes use of Scikit-Learn and shows how to train individual models, assemble an ensemble, and assess the performance of the model. Using the Iris dataset as an example, the example demonstrates how the ensemble may attain high accuracy without overfitting.
Contact Us