What is the Glmnet package in R?

R is a popular programming language used for statistical computing, machine learning, and data analysis. The Glmnet package in R is a tool used for fitting linear and logistic regression models with L1 and L2 regularization. Regularization is a method for preventing overfitting in models by introducing a penalty term to the objective function. The glmnet package provides an efficient and scalable implementation of this technique, making it a popular choice for machine learning and data science applications.

In data science, a common challenge is to build predictive models that generalize well to new or nonlinear data. Overfitting, which occurs when a model is too complex and fits the training data too closely, can be a significant problem. Regularization is one approach to preventing overfitting, but it can be difficult to implement efficiently in high-dimensional datasets. The Glmnet package offers an effective solution for this problem.

Regularized Regression

A type of regression that adds a penalty term to the cost function to reduce overfitting.

Lasso Regression: A type of regularized regression that adds an L1 penalty term to the cost function.
Ridge Regression: A type of regularized regression that adds an L2 penalty term to the cost function.
Elastic Net Regression: A type of regularized regression that adds both an L1 and L2 penalty term to the cost function.

Code Explanation:

The primary function used in the Glmnet package is “glmnet”, which fits a generalized linear model with regularization. The function takes several arguments, including the response variable (y), the predictor variables (X), and the type of regularization (L1 or L2). The syntax for the function is as follows:

glmnet(X, y, family = "gaussian", alpha = 1, lambda = NULL)

In this syntax, X is the matrix of predictor variables, y is the response variable, and the family specifies the type of response variable (e.g., Gaussian, binomial, Poisson). Alpha specifies the combination of L1 and L2 regularization, with a value of 1 indicating L1 regularization and a value of 0 indicating L2 regularization. Lambda specifies the strength of the regularization penalty.

Example 1

Steps 1: Install the glmnet package in R using the following command:

install.packages("glmnet")

Step 2: Load the glmnet package in R using the following command

R

library(glmnet)

Step 3: Load the mtcars datasets

R

# Load the mtcars dataset 
data(mtcars)  
  
# Select all columns except the first as predictors. 
X <- as.matrix(mtcars[, -1])  
# Select the first column as the response. 
y <- mtcars[, 1]

Step 3: Fit a regularized regression model using the glmnet function.

The following code fits a Lasso regression model, and the Summary(model) provides information on the fitted model, like the number of non-zero coefficients, the value of the regularization parameter lambda used, and the coefficients themselves.

R

# Fit a regularized linear regression model 
model = glmnet(X, y, family = "gaussian", alpha = 1) 
summary(model)

Output:

          Length Class     Mode   
a0         79    -none-    numeric
beta      790    dgCMatrix S4     
df         79    -none-    numeric
dim         2    -none-    numeric
lambda     79    -none-    numeric
dev.ratio  79    -none-    numeric
nulldev     1    -none-    numeric
npasses     1    -none-    numeric
jerr        1    -none-    numeric
offset      1    -none-    logical
call        5    -none-    call   
nobs        1    -none-    numeric

Step 4: Plot the model

plot(model) will plot the relationship between the regularization parameter lambda and the estimate coefficients.

R

plot(model, label = TRUE)

Output:

L1 Norm vs Estimated coefficients

In the above graph, each curve represents the path of the coefficients against the L1 norm as lambda varies.

Step 5: Get the model coefficients

R

coef(model, , s = 0.1)

Output:

1 x 1 sparse Matrix of class "dgCMatrix"
                     s1
(Intercept) 20.12070307
cyl         -0.21987003
disp         .         
hp          -0.01300595
drat         0.77162507
wt          -2.63787681
qsec         0.46074875
vs           0.11747113
am           2.11349978
gear         0.30437026
carb        -0.46452172

Step 6: Prediction

Predict values for new data using the predict function. For example, the following code predicts values for new data using the Lasso regression model:

R

# Predict the response variable (y) using the fitted model and the predictor variables (x). 
y_pred <- predict(model, X) 

Example- 2:

Load the “glmnet” package, which provides functions for fitting regularized linear models.
library(glmnet)
Load the “mtcars” dataset, which contains information about car models and their performance.
Create a matrix of predictor variables (X) and a vector of response variables (y) from the mtcars dataset.
Fit a LASSO model (alpha = 1) with cross-validation using the “cv.glmnet” function, nfolds = 5 specifies the number of cross-validation folds to use.
Get the summary using the summary(model)

R

# Load the "glmnet" package 
library(glmnet) 
  
# Load the "mtcars" dataset 
data(mtcars) 
  
# Select all columns except the first as predictors. 
X <- as.matrix(mtcars[, -1])  
# Select the first column as the response. 
y <- mtcars[, 1]  
# Fit a LASSO model with cross-validation using the "cv.glmnet" function. 
fit <- cv.glmnet(X, y, alpha = 1, nfolds = 5) 
  
# Summary 
summary(fit)

Output:

           Length Class  Mode     
lambda     79     -none- numeric  
cvm        79     -none- numeric  
cvsd       79     -none- numeric  
cvup       79     -none- numeric  
cvlo       79     -none- numeric  
nzero      79     -none- numeric  
call        5     -none- call     
name        1     -none- character
glmnet.fit 12     elnet  list     
lambda.min  1     -none- numeric  
lambda.1se  1     -none- numeric  
index       2     -none- numeric

Plot the cross-validation results using the “plot” function.

R

# Plot the cross-validation results using the "plot" function. 
plot(fit) 

cross-validation

Predict the response variable and Plot the Actual vs Predicted graph.

R

# Predict the response variable (y) 
y_pred <- predict(fit, X) 
  
# Plot Actual vs Predicted 
plot(y,y_pred, 
     xlab = 'Actual', 
     ylab = 'Predicted', 
     main = 'Actual vs Predicted')

Output:

Actual (y) vs predicted

Refer this google colab for entire code.

Regularized Regression

Code Explanation:

Example 1

Steps 1: Install the glmnet package in R using the following command:

Step 2: Load the glmnet package in R using the following command

R

Step 3: Load the mtcars datasets

R

Step 3: Fit a regularized regression model using the glmnet function.

R

Step 4: Plot the model

R

Step 5: Get the model coefficients

R

Step 6: Prediction

R

Example- 2:

R

Plot the cross-validation results using the “plot” function.

R

Predict the response variable and Plot the Actual vs Predicted graph.

R

Contact Us