Correlation Analysis

Correlation analysis is a statistical technique used to measure the strength and direction of the relationship between two continuous variables. The most common measure of correlation is the Pearson correlation coefficient. It quantifies the linear relationship between two variables. The Pearson correlation coefficient, denoted as “r,” :

where,

  • r: Correlation coefficient 
  • : i^th value first dataset X
  • : Mean of first dataset X
  •  : i^th value second dataset Y
  •  : Mean of second dataset Y

It can take values between -1 (perfect negative correlation) and 1 (perfect positive correlation), with 0 indicating no linear correlation.


Correaltion


Correlation using R

R

# Sample data
study_hours <- c(5, 7, 3, 8, 6, 9)
exam_scores <- c(80, 85, 60, 90, 75, 95)
 
# Calculate Pearson correlation
correlation <- cor(study_hours, exam_scores)
correlation

                    

Output:

[1] 0.9569094

Visualize the data and correlation

R

# Visualize the data and correlation
plot(study_hours, exam_scores, main = "Scatterplot of Study Hours vs. Exam Scores")
 # Add regression line
abline(lm(exam_scores ~ study_hours), col = "red")
text(3, 90, paste("Correlation: ", round(correlation, 2)))

                    

Output:


Correaltion


Sample Data: We start with two vectors, study_hours and exam_scores, which represent some hypothetical data. study_hours contains the number of hours students spent studying, and exam_scores contains their corresponding exam scores. This data is used for correlation and regression analysis.

  • Calculate Pearson Correlation: The cor() function is used to calculate the Pearson correlation coefficient between study_hours and exam_scores. This coefficient quantifies the linear relationship between the two variables. It’s stored in the correlation variable.
  • Visualize the Data and Correlation: plot(study_hours, exam_scores, main = “Scatterplot of Study Hours vs. Exam Scores”): This line creates a scatterplot with study_hours on the x-axis and exam_scores on the y-axis. The main argument sets the plot title.
  • abline(lm(exam_scores ~ study_hours), col = “red”): This adds a red regression line to the scatterplot. The lm() function fits a linear regression model of exam_scores on study_hours, and abline() plots this regression line on the scatterplot.
  • text(3, 90, paste(“Correlation: “, round(correlation, 2))): This line adds text to the plot, indicating the value of the correlation coefficient. The round() function is used to round the correlation coefficient to two decimal places.
  • The scatterplot visually shows the relationship between study hours and exam scores. The red regression line represents the best-fit linear model that predicts exam scores based on study hours. The text in the plot displays the calculated correlation coefficient.
  • Interpretation: In the scatterplot, you can see a positive linear trend, which means that as the number of study hours increases, exam scores tend to increase.

The red regression line provides a quantitative estimate of this relationship. The steeper the slope of the line, the stronger the correlation. Here, the positive slope indicates a positive correlation.

Correlation and Regression with R

Correlation and regression analysis are two fundamental statistical techniques used to examine the relationships between variables. R Programming Language is a powerful programming language and environment for statistical computing and graphics, making it an excellent choice for conducting these analyses. In this response, I’ll provide an overview of how to perform correlation and regression analysis in R.

Similar Reads

Correlation Analysis

Correlation analysis is a statistical technique used to measure the strength and direction of the relationship between two continuous variables. The most common measure of correlation is the Pearson correlation coefficient. It quantifies the linear relationship between two variables. The Pearson correlation coefficient, denoted as “r,” :...

Regression Analysis

...

Contact Us