Types of Regression Metrics

Using Regression Metrics on California House Prices Dataset

Some common regression metrics in scikit-learn with examples

Mean Absolute Error (MAE)
Mean Squared Error (MSE)
R-squared (R²) Score
Root Mean Squared Error (RMSE)

Mean Absolute Error (MAE)

In the fields of statistics and machine learning, the Mean Absolute Error (MAE) is a frequently employed metric. It’s a measurement of the typical absolute discrepancies between a dataset’s actual values and projected values.

Mathematical Formula

The formula to calculate MAE for a data with “n” data points is:

Where:

x_i represents the actual or observed values for the i-th data point.
y_irepresents the predicted value for the i-th data point.

Example:

Python

from sklearn.metrics import mean_absolute_error
 
true_values = [2.5, 3.7, 1.8, 4.0, 5.2]
predicted_values = [2.1, 3.9, 1.7, 3.8, 5.0]
 
mae = mean_absolute_error(true_values, predicted_values)
print("Mean Absolute Error:", mae)

Output:

Mean Absolute Error: 0.22000000000000003

Mean Squared Error (MSE)

A popular metric in statistics and machine learning is the Mean Squared Error (MSE). It measures the square root of the average discrepancies between a dataset’s actual values and projected values. MSE is frequently utilized in regression issues and is used to assess how well predictive models work.

Mathematical Formula

For a dataset containing ‘n’ data points, the MSE calculation formula is:

where:

x_irepresents the actual or observed value for the i-th data point.
y_irepresents the predicted value for the i-th data point.

Example:

Python

from sklearn.metrics import mean_squared_error
 
true_values = [2.5, 3.7, 1.8, 4.0, 5.2]
predicted_values = [2.1, 3.9, 1.7, 3.8, 5.0]
 
mse = mean_squared_error(true_values, predicted_values)
print("Mean Squared Error:", mse)

Output:

Mean Squared Error: 0.057999999999999996

R-squared (R²) Score

A statistical metric frequently used to assess the goodness of fit of a regression model is the R-squared (R²) score, also referred to as the coefficient of determination. It quantifies the percentage of the dependent variable’s variation that the model’s independent variables contribute to. R² is a useful statistic for evaluating the overall effectiveness and explanatory power of a regression model.

Mathematical Formula

The formula to calculate the R-squared score is as follows:

Where:

R² is the R-Squared.
SSR represents the sum of squared residuals between the predicted values and actual values.
SST represents the total sum of squares, which measures the total variance in the dependent variable.

Example:

Python

from sklearn.metrics import r2_score
 
true_values = [2.5, 3.7, 1.8, 4.0, 5.2]
predicted_values = [2.1, 3.9, 1.7, 3.8, 5.0]
 
r2 = r2_score(true_values, predicted_values)
print("R-squared (R²) Score:", r2)

Output:

R-squared (R²) Score: 0.9588769143505389

Root Mean Squared Error (RMSE)

RMSE stands for Root Mean Squared Error. It is a usually used metric in regression analysis and machine learning to measure the accuracy or goodness of fit of a predictive model, especially when the predictions are continuous numerical values.

The RMSE quantifies how well the predicted values from a model align with the actual observed values in the dataset. Here’s how it works:

Calculate the Squared Differences: For each data point, subtract the predicted value from the actual (observed) value, square the result, and sum up these squared differences.
Compute the Mean: Divide the sum of squared differences by the number of data points to get the mean squared error (MSE).
Take the Square Root: To obtain the RMSE, simply take the square root of the MSE.

Mathematical Formula

The formula for RMSE for a data with ‘n’ data points is as follows:

Where:

RMSE is the Root Mean Squared Error.
x_i represents the actual or observed value for the i-th data point.
y_i represents the predicted value for the i-th data point.

Python

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
 
# Sample data
true_prices = np.array([250000, 300000, 200000, 400000, 350000])
predicted_prices = np.array([240000, 310000, 210000, 380000, 340000])
 
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(true_prices, predicted_prices))
 
print("Root Mean Squared Error (RMSE):", rmse)

Output:

Root Mean Squared Error (RMSE): 12649.110640673518

NOTE:

When using regression metrics in scikit-learn, we generally aim to obtain a single numerical value for each metric.

Regression Metrics

Machine learning is an effective tool for predicting numerical values, and regression is one of its key applications. In the arena of regression analysis, accurate estimation is crucial for measuring the overall performance of predictive models. This is where the famous machine learning library Python Scikit-Learn comes in. Scikit-Learn gives a complete set of regression metrics to evaluate the quality of regression models.

In this article, we are able to explore the basics of regression metrics in scikit-learn, discuss the steps needed to use them effectively, provide some examples, and show the desired output for each metric.

Tags:

#Python scikit-module #AI-ML-DS #Machine Learning #Machine Learning

Regression

Using Regression Metrics on California House Prices Dataset

Types of Regression Metrics

Mean Absolute Error (MAE)

Mathematical Formula

Python

Mean Squared Error (MSE)

Mathematical Formula

Python

R-squared (R²) Score

Mathematical Formula

Python

Root Mean Squared Error (RMSE)

Mathematical Formula

Python

Regression Metrics

Similar Reads

Contact Us