Mathematical Concepts Used Here
The standardised residual, which is expressed in units of the residuals’ standard deviation, is a measurement of how far each measured value of the response variable deviates from its predicted value in the linear regression model. It is determined by:
standardized residual = residual / (sqrt(MSE) * sqrt(1 – hii))
where MSE stands for the mean squared error of the model, ‘hii’ is the leverage for each observation, and residual is the residual for each observation. The leverage quantifies the weight that a measurement has in influencing the model’s fitted values.
The linear regression hypotheses can be verified with the help of the standardised residual plot. It charts the standardized residuals versus the model’s fitted values. The standardised residuals should be randomly distributed around zero and the plot should show no clear patterns or trends if the assumptions of linear regression are true.
Using the plot in R, you can produce a standardized residual plot. Using the which argument set to 1, call the ‘lm( )’ method. This will result in a plot.
Here,
We attempt to model the connection between a dependent variable y and one or more independent variables X in linear regression analysis. A linear regression model with only one independent variable has the following general equation:
y = β0 + β1X + ε
where X is the independent variable, β0 represents the intercept, β1 represents the slope coefficient, represents the error term, and y represents the dependent variable. The unexplained variation in the dependent variable that is not taken into consideration by the independent variable is represented by the error term. In order to predict the value of y given a value for X, linear regression analysis aims to estimate the coefficients β0 and β1 that best match the data.
The sum of squared errors, which is the total of the squared differences between the observed values of y and the predicted values of y, must be minimized when fitting a linear regression model. This can be mathematically written as:
SSE = Σ(yi – ŷi)2
where yi is the value of y that was witnessed, i is the value of y that was predicted, and is the symbol that represents the sum of all values of i. The least squares method, which entails identifying the values of β0 and β1, can be used to estimate the coefficients β0 and β1 that minimise the sum of squared errors.
Once the linear regression model’s coefficients have been estimated, we can use them to forecast the dependent variable y given a value for X. You can write the expected value of y as:
ŷ = β0 + β1X
The residual is the difference between the measured value of y and the predicted value of y, and it can be written as follows:
ei = yi – ŷi
The unaccounted-for variance in the dependent variable that is not accounted for by the independent variable is represented by the residuals. The linear regression model is regarded as being legitimate if the residuals have a normal distribution with a mean of zero and a constant variance. The linear regression model, however, might not be reliable if there is a trend in the residuals, such as nonlinearity or heteroscedasticity, and further steps may be required to enhance the model.
The residuals are divided by their expected standard deviation to produce the standardised residuals. They are helpful in locating any outliers or influential data that might have an impact on the model. A standardised residual is deemed to be an influential observation and needs to be looked at more carefully to see if it is influencing the model if its absolute value is higher than 2.
Contact Us