Correlation Coefficient Formula
Correlation Coefficient Formula: The correlation coefficient is a statistical measure used to quantify the relationship between predicted and observed values in a statistical analysis. It provides insight into the degree of precision between these predicted and actual values.
Correlation coefficients are used to calculate how vital a connection is between two variables. There are different types of correlation coefficients, one of the most popular is Pearson’s correlation (also known as Pearson’s R)which is commonly used in linear regression.
In this article, learn about the correlation coefficient formula, along with what is correlation, its types, examples, and problems.
Table of Content
- What is Correlation?
- Correlation Coefficient Definition
- What is Correlation Coefficient Formula?
- Understanding Correlation Coefficient
- Types of Correlation Coefficient Formula
- Pearson’s Correlation Coefficient Formula
- Sample Correlation Coefficient Formula
- Population Correlation Coefficient Formula
- Pearson’s Correlation
- How to Find Pearson’s Correlation Coefficient?
- Linear Correlation Coefficient
- Cramer’s V Correlation
- Correlation Coefficient Formula Problems
What is Correlation?
Correlation is a statistical measure that describes the extent to which two variables are related to each other. It quantifies the direction and strength of the linear relationship between variables. Generally, a correlation between any two variables is of three types that include:
- Positive Correlation
- Zero Correlation
- Negative Correlation
Correlation Coefficient Definition
A statistical measure that quantifies the strength and direction of the linear relationship between two variables is called the Correlation coefficient. Generally, it is denoted by the symbol ‘r’ and ranges from -1 to 1.
What is Correlation Coefficient Formula?
Correlation coefficient procedure is used to determine how strong a relationship is between the data. The correlation coefficient procedure yields a value between 1 and -1. In which,
- -1 indicates a strong negative relationship
- 1 indicates strong positive relationships
- Zero implies no connection at all
Understanding Correlation Coefficient
- Correlation coefficient of -1 means there is a negative decrease of a fixed proportion, for every positive increase in one variable. Like, the amount of gas in a tank decreases in a perfect correlation with the speed.
- Correlation coefficient of 1 means there is a positive increase of a fixed proportion of others, for every positive increase in one variable. Like, the size of the shoe goes up in perfect correlation with foot length.
- Correlation coefficient of 0 means that for every increase, there is neither a positive nor a negative increase. The two just aren’t related.
Types of Correlation Coefficient Formula
Various types of Correlation Coeeficient are:
Pearson’s Correlation Coefficient Formula
Pearson’s Correlation Coefficient Formula is added below:
[Tex] R~=~\frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}[/Tex]
Sample Correlation Coefficient Formula
Sample Correlation Coefficient Formula is added below:
[Tex] r_{xy}~=~Cov(x,y) / s_x.s_y[/Tex]
where,
- Sxy is Covariance of Sample
- Sx and Sy are Standard Deviations of Sample
Population Correlation Coefficient Formula
Population Correlation Coefficient Formula is added below:
?xy = σxy/σx.σy
where,
- σx and σy are Populatin Standard Deviation
- σxy is Population Covariance
Pearson’s Correlation
It is the most common correlation in statistics. The full name is Pearson’s Product Moment Correlation in short PPMC. It displays the Linear relation between the two sets of data. Two letters are used to represent the Pearson correlation
Greek Letter “rho (ρ)” for a population and the letter “r” for a sample correlation coefficient.
How to Find Pearson’s Correlation Coefficient?
Follow the steps added below to find the Pearson’s Correlation Coefficient of any given data set
Step 1: Firstly make a chart with the given data like subject,x, and y and add three more columns in it xy, x² and y².
Step 2: Now multiply the x and y columns to fill the xy column. For example:- in x we have 24 and in y we have 65 so xy will be 24×65=1560.
Step 3: Now, take the square of the numbers in the x column and fill the x² column.
Step 4: Now, take the square of the numbers in the y column and fill the y² column.
Step 5: Now, add up all the values in the columns and put the result at the bottom. Greek letter sigma (Σ) is the short way of saying summation.
Step 6: Now, use the formula for Pearson’s correlation coefficient:
[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}} [/Tex]
To know which type of variable we have either positive or negative.
Linear Correlation Coefficient
The Pearson’s correlation coefficient is the linear correlation coefficient which returns the value between the -1 and +1. In this -1 indicates a strong negative correlation and +1 indicates a strong positive correlation. If it lies 0 then there is no correlation. This is also known as zero correlation.
The “crude estimations” for analyzing the stability of correlations using Pearson’s Correlation:
r Value | Crude Estimates |
---|---|
+.70 or higher | A very strong positive relationship |
+.40 to +.69 | Strong positive relationship |
+.30 to +.39. | Moderate positive relationship |
+.20 to +.29 | Weak positive relationship |
+.01 to +.19 | No or negligible relationship |
0 | No relationship [zero correlation] |
-.01 to -.19 | No or negligible relationship |
-.20 to -.29 | Weak negative relationship |
-.30 to -.39 | Moderate negative relationship |
-.40 to -.69 | Strong negative relationship |
-.70 or higher | The very strong negative relationship |
Cramer’s V Correlation
It is as similar as the Pearson correlation coefficient. It is used to calculate the correlation with more than 2×2 rows and columns. Cramer’s V correlation varies between 0 and 1. The value close to zero associates that a very little association is there between the variables and if it’s close to 1 it indicates a very strong association.
The “crude estimates” for interpreting strengths of correlations using Cramer’s V Correlation:
Cramer’s V | Crude Estimates |
---|---|
.25 or higher | Very strong relationship |
.15 to .25 | Strong relationship |
.11 to .15 | Moderate relationship |
.06 to .10 | Weak relationship |
.01 to .05 | No or negligible relationship |
Correlation Coefficient Formula Problems
Problem 1: Calculate the correlation coefficient from the following table:
SUBJECT | AGE (X) | GLUCOSE LEVEL (Y) |
---|---|---|
1 | 42 | 98 |
2 | 23 | 68 |
3 | 22 | 73 |
4 | 47 | 79 |
5 | 50 | 88 |
6 | 60 | 82 |
Solution:
Make a table from the given data and add three more columns of XY, X², and Y².
SUBJECT AGE (X) GLUCOSE LEVEL (Y) XY X² Y² 1 42 98 4116 1764 9604 2 23 68 1564 529 4624 3 22 73 1606 484 5329 4 47 79 3713 2209 6241 5 50 88 4400 2500 7744 6 60 82 4980 3600 6724 ∑ 244 488 20379 11086 40266 ∑xy = 20379
∑x = 244
∑y = 488
∑x² = 11086
∑y² = 40266
n = 6.
Put all the values in the Pearson’s correlation coefficient formula:
[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}} [/Tex]
R = 6(20379) – (244)(488) / √[6(11086)-(244)²][6(40266)-(488)² ]
R = 3202 / √[6980][3452]
R = 3202/4972.238
R = 0.6439
It shows that the relationship between the variables of the data is a strong positive relationship.
Problem 2: Calculate the correlation coefficient from the following table:
SUBJECT | AGE (X) | Weight (Y) |
---|---|---|
1 | 40 | 99 |
2 | 25 | 79 |
3 | 22 | 69 |
4 | 54 | 89 |
Solution:
Make a table from the given data and add three more columns of XY, X², and Y².
SUBJECT AGE (X) Weight (Y) XY X² Y² 1 40 99 3960 1600 9801 2 25 79 1975 625 6241 3 22 69 1518 484 4761 4 54 89 4806 2916 7921 ∑ 151 336 12259 5625 28724 ∑xy = 12258
∑x = 151
∑y = 336
∑x² = 5625
∑y² 28724
n = 4
Put all the values in the Pearson’s correlation coefficient formula:
[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}} [/Tex]
R = 4(12258) – (151)(336) / √[4(5625)-(151)²][4(28724)-(336)²]
R = -1704 / √[-301][-2000]
R=-1704/775.886
R=-2.1961
It shows that the relationship between the variables of the data is a very strong negative relationship.
Problem 3: Calculate the correlation coefficient for the following data:
X = 7,9,14 and Y = 17,19,21
Solution:
Given variables are,
X = 7,9,14
and,
Y = 17,19,21
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula.
X Y XY X² Y² 7 17 119 49 36 9 19 171 81 361 14 21 294 196 441 ∑ 30 ∑ 57 ∑ 584 ∑ 326 ∑ 838 ∑xy = 584
∑x = 30
∑y = 57
∑x² = 326
∑y² = 838
n = 3
Put all the values in the Pearson’s correlation coefficient formula:
[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}} [/Tex]
R = 3(584) – (30)(57) / √[3(326)-(30)²][3(838)-(57)²]
R = 42 / √[78][-735]
R = 42/-239.43
R = -0.1754
It shows that the relationship between the variables of the data is negligible relationship
Problem 4: Calculate the correlation coefficient for the following data:
X = 21, 31, 25, 40, 47, 38 and Y = 70,55,60,78,66,80
Solution:
Given variables are,
X = 21,31,25,40,47,38
And,
Y = 70,55,60,78,66,80
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula.
X Y XY X² Y² 21 70 1470 441 4900 31 55 1705 961 3025 25 60 1500 625 3600 40 78 3120 1600 6094 47 66 3102 2209 4356 38 80 3040 1444 6400 ∑202 ∑409 ∑13937 ∑7280 ∑28265 ∑xy = 13937
∑x = 202
∑y = 409
∑x² = 7280
∑y² = 28265
n = 6
Put all the values in the Pearson’s correlation coefficient formula:
[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}} [/Tex]
R = 6(13937) – (202)(409) / √[6(7280) – (202)²][6(28265) – (409)²]
R = 1004 /√[2876][2909]
R = 1004 / 2892.452938
R = 0.3471
It shows that the relationship between the variables of the data is a moderate positive relationship.
Problem 5: Calculate the correlation coefficient for the following data?
X = 5 ,9 ,14, 16 and Y = 6, 10, 16, 20 .
Solution:
Given variables are,
X = 5 ,9 ,14, 16
And
Y = 6, 10, 16, 20.
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula add all the values in the columns to get the values used in the formula
X Y XY X² Y² 5 6 30 25 36 9 10 90 81 100 14 16 224 196 256 16 20 320 256 400 ∑44 ∑52 ∑664 ∑558 ∑792 ∑xy = 664
∑x = 44
∑y = 52
∑x² = 558
∑y² = 792
n = 4
Put all the values in the Pearson’s correlation coefficient formula:
[Tex] R= n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)² [/Tex]
R = 4(664) – (44)(52) / √[4(558) – (44)²][4(792) – (52)²]
R = 368 / √[296][464]
R = 368/370.599
R = 0.9930
It shows that the relationship between the variables of the data is a very strong positive relationship.
Problem 6: Calculate the correlation coefficient for the following data:
X = 10, 13, 15 ,17 ,19 and Y = 5,10,15,20,25.
Solution:
Given variables are,
X = 10, 13, 15 ,17 ,19 and Y = 5, 10, 15, 20, 25.
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in formula,
X Y XY X² Y² 10 5 50 100 25 13 10 130 169 100 15 15 225 225 225 17 20 340 340 400 19 25 475 475 625 ∑74 ∑75 ∑1103 ∑1144 ∑1375 ∑xy = 1103
∑x = 74
∑y = 75
∑x² = 1144
∑y² = 1375
n = 5
Put all the values in the Pearson’s correlation coefficient formula:
[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}} [/Tex]
R = 5(1103) – (74)(75) / √ [5(1144) – (74)²][5(1375) – (75)²]
R = -35 / √[244][1250]
R = -35/552.26
R = 0.0633
It shows that the relationship between the variables of the data is a negligible relationship.
Problems 7: Calculate the correlation coefficient for the following data:
X = 12, 10, 42, 27, 35, 56 and Y = 13, 15, 56, 34, 65, 26
Solution:
Given variables are,
X = 12, 10, 42, 27, 35, 56 and Y = 13, 15, 56, 34, 65, 26
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula
X Y XY X² Y² 12 13 156 144 169 10 15 150 100 225 42 56 2352 1764 3136 27 34 918 729 1156 35 65 2275 1225 4225 56 26 1456 3136 676 ∑182 ∑209 ∑7307 ∑7098 ∑9587 ∑xy = 7307
∑x = 182
∑y = 209
∑x² = 7098
∑y² = 9587
n = 6
Put all the values in the Pearson’s correlation coefficient formula:
[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}} [/Tex]
R = 6(7307) – (182)(209) / √ {[6(7098) – (182)²][6(9587)-(209)²]}
R = 5804 / √[9464][13841]
R = 5804/11445.139
R = 0.5071
It shows that the relationship between the variables of the data is a strong positive relationship.
Summary – Correlation Coefficient Formula
The correlation coefficient serves as a statistical tool to assess the relationship between two variables in a dataset. Represented by the symbol rrr, its value ranges from -1 to 1, indicating the strength and direction of the linear association. A correlation of 1 signifies a perfect positive linear relationship, while -1 indicates a perfect negative linear relationship. A value of 0 implies no linear relationship. The formula to calculate the correlation coefficient involves the number of data points, the sum of products of corresponding values of the variables, and their sums and squares. This coefficient aids in understanding the extent to which one variable can predict the other, providing valuable insights in various fields including economics, social sciences, and engineering.
FAQs on Correlation Coefficient Formula
How to calculate a correlation coefficient?
Correlation Coefficient is covariance divided by the product of the standard deviations of the given variable.
What is Correlation Calculated for?
Statistical measure of the relationship between two variables is called Correlation.
What is an R value in statistics?
R in statistis is called Pearson correlation coefficient and is a measure of any linear trend between two variables.
Contact Us