Bias-Variance Tradeoff Using Python
To show the bias-variance tradeoff using Python, we can create a simple example using polynomial regression. We’ll generate some synthetic data and fit polynomial models of different degrees to observe how bias and variance change with model complexity.
Importing Neccesary Libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
Generating Synthetic Data
np.random.seed(0)
X = np.linspace(0, 10, 100)
y = 0.5 * X**2 - X + np.random.normal(0, 3, 100)
Fitting the model
We’ll define a function to fit polynomial models of different degrees.
def fit_polynomial_model(X, y, degree):
model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
model.fit(X[:, np.newaxis], y)
return model
Plotting the Models
- Fit polynomial models of degrees 1, 2, 3, and 4 to the data and plot the results.
degrees = [1, 2, 3, 4]
plt.figure(figsize=(12, 6))
for i, degree in enumerate(degrees, 1):
model = fit_polynomial_model(X, y, degree)
y_pred = model.predict(X[:, np.newaxis])
plt.subplot(2, 2, i)
plt.scatter(X, y, color='blue', label='data')
plt.plot(X, y_pred, color='red', label='model')
plt.title(f'Degree {degree} polynomial')
plt.legend()
plt.tight_layout()
plt.show()
Output:
As we increase the degree of the polynomial, the bias decreases (the model can fit the data more closely) but the variance increases (the model becomes more sensitive to noise in the data). This demonstrates the bias-variance tradeoff which we observe here.
Finding the right balance between bias and variance is essential. Increasing model complexity can reduce bias but may increase variance, and vice versa. Therefore, it’s important to tune models carefully to minimize both bias and variance, leading to models that generalize well to unseen data.
How to Balance bias variance tradeoff
A fundamental concept in machine learning is the bias-variance tradeoff, which entails striking the ideal balance between model complexity and generalization performance. It is essential for figuring out which model works best for a certain situation and for comprehending how several models function.
Contact Us