Implement the K-fold Technique on Regression

Implement the K-fold Technique on Classification

Regression machine learning models are used to predict the target variable which is of continuous nature like the price of a commodity or sales of a firm. Below are the complete steps for implementing the K-fold cross-validation technique on regression models.

Step 1: Importing all required packages

Set up the R environment by importing all necessary packages and libraries. Below is the implementation of this step.

R

# loading required packages 
 
# package to perform data manipulation 
# and visualization 
library(tidyverse) 
 
# package to compute 
# cross - validation methods 
library(caret) 
 
# installing package to 
# import desired dataset 
install.packages("datarium")

Step 2: Loading and inspecting the dataset

In this step, the desired dataset is loaded in the R environment. After that, some rows of the data set are printed in order to understand its structure. Below is the code to carry out this task.

R

# loading the dataset 
data("marketing", package = "datarium") 
 
# inspecting the dataset 
head(marketing)

Output:

  youtube facebook newspaper sales
1  276.12    45.36     83.04 26.52
2   53.40    47.16     54.12 12.48
3   20.64    55.08     83.16 11.16
4  181.80    49.56     70.20 22.20
5  216.96    12.96     70.08 15.48
6   10.44    58.68     90.00  8.64

Step 3: Building the model with K-fold algorithm

The value of the K parameter is defined in the trainControl() function and the model is developed according to the steps mentioned in the algorithm of the K-fold cross-validation technique. Below is the implementation.

R

# setting seed to generate a  
# reproducible random sampling 
set.seed(125)  
 
# defining training control 
# as cross-validation and  
# value of K equal to 10 
train_control <- trainControl(method = "cv", 
                              number = 10) 
 
# training the model by assigning sales column 
# as target variable and rest other column 
# as independent variable 
model <- train(sales ~., data = marketing,  
               method = "lm", 
               trControl = train_control)

Step 4: Evaluate the model performance

As mentioned in the algorithm of K-fold that model is tested against every unique fold(or subset) of the dataset and in each case, the prediction error is calculated and at last, the mean of all prediction errors is treated as the final performance score of the model. So, below is the code to print the final score and overall summary of the model.

R

# printing model performance metrics 
# along with other details 
print(model)

Output:

Linear Regression

200 samples

3 predictor

No pre-processing

Resampling: Cross-Validated (10 fold)

Summary of sample sizes: 181, 180, 180, 179, 180, 180, …

Resampling results:

RMSE Rsquared MAE

2.027409 0.9041909 1.539866

Tuning parameter ‘intercept’ was held constant at a value of TRUE

K-fold Cross Validation in R Programming

The prime aim of any machine learning model is to predict the outcome of real-time data. To check whether the developed model is efficient enough to predict the outcome of an unseen data point, performance evaluation of the applied machine learning model becomes very necessary. K-fold cross-validation technique is basically a method of resampling the data set in order to evaluate a machine learning model. In this technique, the parameter K refers to the number of different subsets that the given data set is to be split into. Further, K-1 subsets are used to train the model and the left out subsets are used as a validation set.

Tags:

#R Data-science #R Machine-Learning #R Language