Types of Regression Metrics

Some common regression metrics in scikit-learn with examples

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • R-squared (R²) Score
  • Root Mean Squared Error (RMSE)

Mean Absolute Error (MAE)

In the fields of statistics and machine learning, the Mean Absolute Error (MAE) is a frequently employed metric. It’s a measurement of the typical absolute discrepancies between a dataset’s actual values and projected values.

Mathematical Formula

The formula to calculate MAE for a data with “n” data points is:

Where:

  • xi represents the actual or observed values for the i-th data point.
  • yi represents the predicted value for the i-th data point.

Example:

Python

from sklearn.metrics import mean_absolute_error
 
true_values = [2.5, 3.7, 1.8, 4.0, 5.2]
predicted_values = [2.1, 3.9, 1.7, 3.8, 5.0]
 
mae = mean_absolute_error(true_values, predicted_values)
print("Mean Absolute Error:", mae)

                    

Output:

Mean Absolute Error: 0.22000000000000003

Mean Squared Error (MSE)

A popular metric in statistics and machine learning is the Mean Squared Error (MSE). It measures the square root of the average discrepancies between a dataset’s actual values and projected values. MSE is frequently utilized in regression issues and is used to assess how well predictive models work.

Mathematical Formula

For a dataset containing ‘n’ data points, the MSE calculation formula is:

where:

  • xi represents the actual or observed value for the i-th data point.
  • yi represents the predicted value for the i-th data point.

Example:

Python

from sklearn.metrics import mean_squared_error
 
true_values = [2.5, 3.7, 1.8, 4.0, 5.2]
predicted_values = [2.1, 3.9, 1.7, 3.8, 5.0]
 
mse = mean_squared_error(true_values, predicted_values)
print("Mean Squared Error:", mse)

                    

Output:

Mean Squared Error: 0.057999999999999996

R-squared (R²) Score

A statistical metric frequently used to assess the goodness of fit of a regression model is the R-squared (R2) score, also referred to as the coefficient of determination. It quantifies the percentage of the dependent variable’s variation that the model’s independent variables contribute to. R2 is a useful statistic for evaluating the overall effectiveness and explanatory power of a regression model.

Mathematical Formula

The formula to calculate the R-squared score is as follows:

Where:

  • R2 is the R-Squared.
  • SSR represents the sum of squared residuals between the predicted values and actual values.
  • SST represents the total sum of squares, which measures the total variance in the dependent variable.

Example:

Python

from sklearn.metrics import r2_score
 
true_values = [2.5, 3.7, 1.8, 4.0, 5.2]
predicted_values = [2.1, 3.9, 1.7, 3.8, 5.0]
 
r2 = r2_score(true_values, predicted_values)
print("R-squared (R²) Score:", r2)

                    

Output:

R-squared (R²) Score: 0.9588769143505389

Root Mean Squared Error (RMSE)

RMSE stands for Root Mean Squared Error. It is a usually used metric in regression analysis and machine learning to measure the accuracy or goodness of fit of a predictive model, especially when the predictions are continuous numerical values.

The RMSE quantifies how well the predicted values from a model align with the actual observed values in the dataset. Here’s how it works:

  1. Calculate the Squared Differences: For each data point, subtract the predicted value from the actual (observed) value, square the result, and sum up these squared differences.
  2. Compute the Mean: Divide the sum of squared differences by the number of data points to get the mean squared error (MSE).
  3. Take the Square Root: To obtain the RMSE, simply take the square root of the MSE.

Mathematical Formula

The formula for RMSE for a data with ‘n’ data points is as follows:

Where:

  • RMSE is the Root Mean Squared Error.
  • xi represents the actual or observed value for the i-th data point.
  • yi represents the predicted value for the i-th data point.

Python

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
 
# Sample data
true_prices = np.array([250000, 300000, 200000, 400000, 350000])
predicted_prices = np.array([240000, 310000, 210000, 380000, 340000])
 
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(true_prices, predicted_prices))
 
print("Root Mean Squared Error (RMSE):", rmse)

                    

Output:

Root Mean Squared Error (RMSE): 12649.110640673518

NOTE:

When using regression metrics in scikit-learn, we generally aim to obtain a single numerical value for each metric.

Regression Metrics

Machine learning is an effective tool for predicting numerical values, and regression is one of its key applications. In the arena of regression analysis, accurate estimation is crucial for measuring the overall performance of predictive models. This is where the famous machine learning library Python Scikit-Learn comes in. Scikit-Learn gives a complete set of regression metrics to evaluate the quality of regression models.

In this article, we are able to explore the basics of regression metrics in scikit-learn, discuss the steps needed to use them effectively, provide some examples, and show the desired output for each metric.

Similar Reads

Regression

Regression fashions are algorithms used to expect continuous numerical values primarily based on entering features. In scikit-learn, we will use numerous regression algorithms, such as Linear Regression, Decision Trees, Random Forests, and Support Vector Machines (SVM), amongst others....

Types of Regression Metrics

Some common regression metrics in scikit-learn with examples...

Using Regression Metrics on California House Prices Dataset

...

Conclusion

...

Contact Us