California House Price Prediction

Training Models for California Housing Price Forecasting

California House Price Prediction is a popular dataset used to practice building machine learning models for regression tasks. We will be following these steps to predict the house prices.

Step 1: Loading California House Price Dataset

The read_csv() method read a csv file to dataframe and the info() method helps to get a quick description of the data such as columns, the total number of rows, each attribute type and the number of nonnull values.

Python

import pandas as pd
housing= pd.read_csv("https://media.w3wiki.org/wp-content/uploads/20240319120216/housing.csv")
housing.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   longitude           20640 non-null  float64
 1   latitude            20640 non-null  float64
 2   housing_median_age  20640 non-null  float64
 3   total_rooms         20640 non-null  float64
 4   total_bedrooms      20433 non-null  float64
 5   population          20640 non-null  float64
 6   households          20640 non-null  float64
 7   median_income       20640 non-null  float64
 8   median_house_value  20640 non-null  float64
 9   ocean_proximity     20640 non-null  object

As we can see there are 20640 instances in the dataset. The total_bedrooms has only 20433 non-null values (207 districts are missing), and all attributes are numerical except the ocean_proximity field (a text field). The median_house_value is the housing price, which we need to predict using our machine learning model.

Before getting to model training, let’s analyse how attributes in the housing data correlate with the median house value (house price). We can easily find the standard correlation coefficient using the corr() method. Since the ocean_proximity attribute field is non-numeric, we need to drop the field to calculate the correlation.

Python

def find_correlation(housing_numeric):
  # computing standard correlation coefficient
  corr_matrix = housing_numeric.corr()
  # fetch and return attribute correlates 
  # with the median housing value
  return corr_matrix["median_house_value"].sort_values(
    ascending=False)
  
# drop ocean_proximity column
housing_numeric = housing.drop("ocean_proximity", axis=1)
# find correlation coefficient
cor_coef = find_correlation(housing_numeric)
print("Correlation Coefficient::", cor_coef)

Output:

Correlation Coefficient:: median_house_value    1.000000
median_income         0.688075
total_rooms           0.134153
housing_median_age    0.105623
households            0.065843
total_bedrooms        0.049686
population           -0.024650
longitude            -0.045967
latitude             -0.144160
Name: median_house_value, dtype: float64

Here, the median house value tends to go up when the median income increases. Similarly, you can notice a small negative correlation with the latitude; the median house value has a slight tendency to go down when we go north.

Regression Models for California Housing Price Prediction

In this article, we will build a machine-learning model that predicts the median housing price using the California housing price dataset from the StatLib repository. The dataset is based on the 1990 California census and has metrics. It is a supervised learning task (labeled training) because each instance has an expected output (median housing price). It is a univariate multiple regression task since we predict a single value based on multiple features.

Table of Content

California House Price Prediction
Training Models for California Housing Price Forecasting

1. Linear Regression Model
2. Decision Tree Regression Model
3. Random Forest Regression Model
Evaluating Using Cross-Validation

Fine Tune The Models

Tags:

#AI-ML-DS With Python #Machine Learning Projects #AI-ML-DS #Machine Learning #Machine Learning

Training Models for California Housing Price Forecasting

California House Price Prediction

Regression Models for California Housing Price Prediction

Similar Reads

Contact Us