Fuel Efficiency Forecasting with CatBoost

The automobile sector is continuously looking for new and creative ways to cut fuel use in its pursuit of economy, and sustainability. Comprehending car fuel usage has become more crucial due to the increase in gas costs and the increased emphasis on environmental sustainability. A technique for this would be to forecast and examine fuel use using machine learning techniques. In this blog article, the potent machine learning tool CatBoost is introduced along with its potential applications for modeling automobile fuel usage. With an emphasis on simplicity, this post will walk you through the basic ideas, offer examples to help you understand, and list the actions required to put this solution into practice. Starting with the fundamentals, we will gradually increase your understanding by going over important ideas.

Table of Content

  • Fuel consumption in vehicles using Catboost
  • The Power of CatBoost
  • Steps to Predict Fuel Consumptions Using CatBoost
  • Develop a CatBoost Model for Fuel consumptions in vehicle
  • Conclusion

Fuel consumption in vehicles using Catboost

Within the automobile sector, fuel consumption prediction plays a crucial role in driving driver behaviour optimization as well as vehicle design. These predictions may now be made with greater ease because of machine learning models, especially gradient-boosting methods. We will look at using CatBoost, a high-performance gradient boosting library, to forecast car fuel use in this blog article. Fundamental ideas will be discussed, along with a step-by-step tutorial on creating a predictive model. Even as a novice, you will have a firm grasp of how to utilize CatBoost for this purpose by the conclusion of this essay.

The Power of CatBoost

CatBoost short for “Category Boosting,” it’s an open-source gradient boosting library developed by Yandex that excels in dealing with categorical features and is known for its speed and accuracy. When managing data points that reflect groups or categories (such as car type or fuel type), it works very effectively. “Category Boosting” is what CatBoost stands for and it’s well-known for being very effective and user-friendly. There are several benefits that CatBoost provides for fuel consumption prediction:

  • High Accuracy: When compared to other machine learning algorithms, it can estimate fuel usage with extraordinary precision.
  • Handles Complex Data: CatBoost can process a wide range of data types, including text (car model) and numerical data (engine size).
  • Speedy and Efficient: It has a reputation for being both speedy and efficient, which makes it suitable for real-world applications.

Steps to Predict Fuel Consumptions Using CatBoost

Prerequisite:

First, we need to install the catboost in our local system

!pip install catboost

1. Data Collection and Preprocessing

Collect your dataset first. The vehicle type, engine size, fuel type, weight and historical fuel consumption records are some of the parameters that your dataset may include for fuel consumption prediction.

2. Data Cleaning

Handle missing values, eliminate duplicates, and encode category variables to clean up the data. This phase is made easier by the fact that CatBoost can directly handle category data.

3. Splitting the Data

Split your data into training and test sets to evaluate the model’s performance.

4. Training the Model

Use the training dataset to educate the CatBoost model. Declare the categorized qualities and other parameters.

5. Evaluating the Model

Use measures such as Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE) to assess the model performance.

Develop a CatBoost Model for Fuel consumptions in vehicle

Let’s get our hands dirty and develop a CatBoost model to forecast fuel usage now! Below is an explanation of the procedure:

Step 1: Importing Libraries

We start by importing the necessary libraries.

Python
import numpy as np
import pandas as pd
from catboost import CatBoostRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt
import seaborn as sns

Step 2: Loading the Dataset

We are going to utilize the UCI Machine Learning Repository’s Fuel Economy dataset.

Python
# Load the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'car_name']
data = pd.read_csv(url, names=column_names, delim_whitespace=True)

# Display the first few rows of the dataset
print(data.head())

Output:

    mpg  cylinders  displacement horsepower  weight  acceleration  model_year  \
0 18.0 8 307.0 130.0 3504.0 12.0 70
1 15.0 8 350.0 165.0 3693.0 11.5 70
2 18.0 8 318.0 150.0 3436.0 11.0 70
3 16.0 8 304.0 150.0 3433.0 12.0 70
4 17.0 8 302.0 140.0 3449.0 10.5 70

origin car_name
0 1 chevrolet chevelle malibu
1 1 buick skylark 320
2 1 plymouth satellite
3 1 amc rebel sst
4 1 ford torino

Step 3: Preprocessing the Data

We will choose pertinent features, transform categorical data, and handle missing values.

Python
# Replace '?' with NaN and drop missing values
data.replace('?', np.nan, inplace=True)
data.dropna(inplace=True)

# Convert relevant columns to numeric
data['horsepower'] = data['horsepower'].astype(float)

# Convert categorical columns to category type
data['origin'] = data['origin'].astype('category')

# Define the target variable
data['fuel_consumption'] = 235.215 / data['mpg']  # Convert mpg to l/100km (1 mpg = 235.215 / fuel consumption in l/100km)

# Define features and target variable
X = data[['cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin']]
y = data['fuel_consumption']

Step 4: Splitting the Data

Split the data into training and test sets.

Python
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 5: Training the CatBoost Model

Using the training data, we initialize and train the CatBoost regressor.

Python
# Initialize the CatBoost regressor
model = CatBoostRegressor(iterations=500, learning_rate=0.1, depth=6, cat_features=['origin'], verbose=100)

# Train the model
model.fit(X_train, y_train)

Output:

0:    learn: 3.6611097    total: 2.74ms    remaining: 1.37s
100: learn: 0.8048411 total: 374ms remaining: 1.48s
200: learn: 0.5390630 total: 724ms remaining: 1.08s
300: learn: 0.4066602 total: 915ms remaining: 605ms
400: learn: 0.3128946 total: 1.13s remaining: 278ms
499: learn: 0.2514340 total: 1.53s remaining: 0us
<catboost.core.CatBoostRegressor at 0x7a4cd147c910>

Step 6: Making Predictions and Evaluating the Model

We assess the training model’s performance by using it to generate predictions on the test set.

Python
# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate MAE and RMSE
mae = mean_absolute_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared=False)

print(f'MAE: {mae}')
print(f'RMSE: {rmse}')

Output:

MAE: 0.858216602638723
RMSE: 1.1143796648226478

Step 7: Presenting the Findings

Let’s illustrate the fuel consumption estimates for actual and predicted use.

Python
# Plot predicted vs actual values
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red')
plt.xlabel('Actual Fuel Consumption')
plt.ylabel('Predicted Fuel Consumption')
plt.title('Actual vs Predicted Fuel Consumption')
plt.show()

Output:

Step 8: Interactive User Data Entry for Forecasting

With our interactive tool, customers can enter information about their automobiles and receive a real-time projection of their fuel usage.

Python
# Define widgets for user input
cylinders_widget = widgets.IntSlider(min=3, max=8, step=1, description='Cylinders:')
displacement_widget = widgets.FloatSlider(min=50, max=500, step=10, description='Displacement:')
horsepower_widget = widgets.FloatSlider(min=50, max=250, step=10, description='Horsepower:')
weight_widget = widgets.FloatSlider(min=1500, max=5500, step=100, description='Weight:')
acceleration_widget = widgets.FloatSlider(min=8, max=24, step=1, description='Acceleration:')
model_year_widget = widgets.IntSlider(min=70, max=82, step=1, description='Model Year:')
origin_widget = widgets.Dropdown(options=[1, 2, 3], description='Origin:')

# Define function to make predictions based on user input
def predict_fuel_consumption(cylinders, displacement, horsepower, weight, acceleration, model_year, origin):
    input_data = pd.DataFrame({
        'cylinders': [cylinders],
        'displacement': [displacement],
        'horsepower': [horsepower],
        'weight': [weight],
        'acceleration': [acceleration],
        'model_year': [model_year],
        'origin': [origin]
    })
    # Ensure the categorical feature 'origin' is encoded properly
    input_data['origin'] = input_data['origin'].astype('category')
    prediction = model.predict(input_data)[0]
    print(f'Predicted Fuel Consumption: {prediction:.2f} liters/100km')

# Display widgets
interactive_plot = widgets.interactive(predict_fuel_consumption, 
                                       cylinders=cylinders_widget, 
                                       displacement=displacement_widget, 
                                       horsepower=horsepower_widget, 
                                       weight=weight_widget, 
                                       acceleration=acceleration_widget, 
                                       model_year=model_year_widget, 
                                       origin=origin_widget)

display(interactive_plot)

Output:

Conclusion

Reducing fuel consumption using CatBoost requires knowing the benefits of the algorithm properly prepping the data and continuously refining the model. Even novices may use machine learning to help create more fuel-efficient cars by following the instructions provided in this article.

In this blog article, we looked at how to use a public and synthetic dataset to forecast fuel use using CatBoost. From data preparation to model training and assessment , we went through each step and displayed the outcomes. We also included an interactive element to enable predictions to be made in real time depending on inputs from users. Even if you’re a novice this thorough instruction should help you get started with CatBoost fuel consumption prediction. Have fun with your modeling!




Contact Us