Steps to calculate studentized residuals in Python
Step 1: Import the libraries.
We need to import the libraries in the program that we have installed above.
Python3
# Importing necessary packages import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols import matplotlib.pyplot as plt |
Step 2: Create a data frame.
Firstly, we are required to create a data frame. With the help of the pandas’ package, we can create a data frame. The snippet is given below,
Python3
# Creating dataframe dataframe = pd.DataFrame({ 'Score' : [ 80 , 95 , 80 , 78 , 84 , 96 , 86 , 75 , 97 , 89 ], 'Benchmark' : [ 27 , 28 , 18 , 18 , 29 , 30 , 25 , 25 , 24 , 29 ]}) |
Step 3: Build a simple linear regression model.
Now we need to build a simple linear regression model of the created dataset. For fitting a simple linear regression model Python provides ols() function from statsmodels package.
Syntax:
statsmodels.api.OLS(y, x)
Parameters:
- y : It represents the variable that depends on x
- x :It represents independent variable
Example:
Python3
# Building simple linear regression model simple_regression_model = ols( 'Score ~ Benchmark' , data = dataframe).fit() |
Step 4: Producing studentized residual.
For producing a dataFrame that would contain the studentized residuals of each observation in the dataset we can use outlier_test() function.
Syntax:
simple_regression_model.outlier_test()
This function will produce a dataFrame that would contain the studentized residuals for each observation in the dataset
Python3
# Producing studentized residual stud_res = simple_regression_model.outlier_test() |
Below is the complete implementation.
Python3
# Python program to calculate studentized residual # Importing necessary packages import numpy as np import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols import matplotlib.pyplot as plt # Creating dataframe dataframe = pd.DataFrame({ 'Score' : [ 80 , 95 , 80 , 78 , 84 , 96 , 86 , 75 , 97 , 89 ], 'Benchmark' : [ 27 , 28 , 18 , 18 , 29 , 30 , 25 , 25 , 24 , 29 ]}) # Building simple linear regression model simple_regression_model = ols( 'Score ~ Benchmark' , data = dataframe).fit() # Producing studentized residual result = simple_regression_model.outlier_test() print (result) |
Output:
The output is a data frame that contains:
- The studentized residual
- The unadjusted p-value of the studentized residual
- The Bonferroni-corrected p-value of the studentized residual
We can see that the studentized residual for the first observation in the dataset is -1.121201, the studentized residual for the second observation is 0.954871, and so on.
How to Calculate Studentized Residuals in Python?
Studentized residual is a statistical term and it is defined as the quotient obtained by dividing a residual by its estimated standard deviation. This is a crucial technique used in the detection of outlines. Practically, one can claim that any type of observation in a dataset having a studentized residual of more than 3 (absolute value) is an outlier.
The following Python libraries should already be installed in our system:
- pandas
- numpy
- statsmodels
You can install these packages on your system by using the below command on the terminal.
pip3 install pandas numpy statsmodels matplotlib
Contact Us