Correlation Coefficient Formula

Correlation Coefficient Formula: The correlation coefficient is a statistical measure used to quantify the relationship between predicted and observed values in a statistical analysis. It provides insight into the degree of precision between these predicted and actual values.

Correlation coefficients are used to calculate how vital a connection is between two variables. There are different types of correlation coefficients, one of the most popular is Pearson’s correlation (also known as Pearson’s R)which is commonly used in linear regression.

In this article, learn about the correlation coefficient formula, along with what is correlation, its types, examples, and problems.

Table of Content

  • What is Correlation?
  • Correlation Coefficient Definition
  • What is Correlation Coefficient Formula?
    • Understanding Correlation Coefficient
  • Types of Correlation Coefficient Formula
    • Pearson’s Correlation Coefficient Formula
    • Sample Correlation Coefficient Formula
  • Population Correlation Coefficient Formula
  • Pearson’s Correlation
    • How to Find Pearson’s Correlation Coefficient?
  • Linear Correlation Coefficient
    • Cramer’s V Correlation
  • Correlation Coefficient Formula Problems

What is Correlation?

Correlation is a statistical measure that describes the extent to which two variables are related to each other. It quantifies the direction and strength of the linear relationship between variables. Generally, a correlation between any two variables is of three types that include:

  • Positive Correlation
  • Zero Correlation
  • Negative Correlation

Correlation

Correlation Coefficient Definition

A statistical measure that quantifies the strength and direction of the linear relationship between two variables is called the Correlation coefficient. Generally, it is denoted by the symbol ‘r’ and ranges from -1 to 1.

What is Correlation Coefficient Formula?

Correlation coefficient procedure is used to determine how strong a relationship is between the data. The correlation coefficient procedure yields a value between 1 and -1. In which,

  • -1 indicates a strong negative relationship
  • 1 indicates strong positive relationships
  • Zero implies no connection at all

Understanding Correlation Coefficient

  • Correlation coefficient of -1 means there is a negative decrease of a fixed proportion, for every positive increase in one variable. Like, the amount of gas in a tank decreases in a perfect correlation with the speed.
  • Correlation coefficient of  1 means there is a positive increase of a fixed proportion of others, for every positive increase in one variable. Like, the size of the shoe goes up in perfect correlation with foot length.
  • Correlation coefficient of 0 means that for every increase, there is neither a positive nor a negative increase. The two just aren’t related.

Types of Correlation Coefficient Formula

Various types of Correlation Coeeficient are:

Pearson’s Correlation Coefficient Formula

Pearson’s Correlation Coefficient Formula is added below:

[Tex] R~=~\frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}[/Tex] 

Sample Correlation Coefficient Formula

Sample Correlation Coefficient Formula is added below:

 [Tex] r_{xy}~=~Cov(x,y) / s_x.s_y[/Tex] 

where,

  • Sxy is Covariance of Sample
  • Sx and Sy are Standard Deviations of Sample

Population Correlation Coefficient Formula

Population Correlation Coefficient Formula is added below:

?xy = σxy/σx.σy

where,

  • σx and σy are Populatin Standard Deviation
  • σxy is Population Covariance

Pearson’s Correlation

It is the most common correlation in statistics. The full name is Pearson’s Product Moment Correlation in short PPMC. It displays the Linear relation between the two sets of data. Two letters are used to represent the Pearson correlation

Greek Letter “rho (ρ)” for a population and the letter “r” for a sample correlation coefficient.

How to Find Pearson’s Correlation Coefficient?

Follow the steps added below to find the Pearson’s Correlation Coefficient of any given data set

Step 1: Firstly make a chart with the given data like subject,x, and y and add three more columns in it xy, x² and y².

Step 2: Now multiply the x and y columns to fill the xy column. For example:- in x we have 24 and in y we have 65 so xy will be 24×65=1560.

Step 3: Now, take the square of the numbers in the x column and fill the x² column.

Step 4: Now, take the square of the numbers in the y column and fill the y² column.

Step 5: Now, add up all the values in the columns and put the result at the bottom. Greek letter sigma (Σ) is the short way of saying summation.

Step 6: Now, use the formula for Pearson’s correlation coefficient:

[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}        [/Tex]             

To know which type of variable we have either positive or negative.

Linear Correlation Coefficient

The Pearson’s correlation coefficient is the linear correlation coefficient which returns the value between the -1 and +1. In this -1 indicates a strong negative correlation and +1 indicates a strong positive correlation. If it lies 0 then there is no correlation. This is also known as zero correlation.

The “crude estimations” for analyzing the stability of correlations using Pearson’s Correlation:

r ValueCrude Estimates
+.70 or higher A very strong positive relationship
+.40 to +.69Strong positive relationship
+.30 to +.39. Moderate positive relationship
+.20 to +.29   Weak positive relationship
+.01 to +.19  No or negligible relationship
        0 No relationship [zero correlation]
-.01 to -.19 No or negligible relationship
-.20 to -.29Weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or higher  The very strong negative relationship

Cramer’s V Correlation

It is as similar as the Pearson correlation coefficient. It is used to calculate the correlation with more than 2×2 rows and columns. Cramer’s V correlation varies between 0 and 1. The value close to zero associates that a very little association is there between the variables and if it’s close to 1 it indicates a very strong association.

The “crude estimates” for interpreting strengths of correlations using Cramer’s V Correlation:

Cramer’s VCrude Estimates
.25 or higher  Very strong relationship
.15 to .25 Strong relationship
.11 to .15   Moderate relationship
.06 to .10      Weak relationship
.01 to .05 No or negligible relationship

Correlation Coefficient Formula Problems

Problem 1: Calculate the correlation coefficient from the following table:

SUBJECTAGE (X)GLUCOSE LEVEL (Y)
14298
22368
32273
44779
55088
66082

Solution:

Make a table from the given data and add three more columns of XY, X², and Y².

SUBJECT AGE (X)GLUCOSE LEVEL (Y)XY    X²
14298411617649604
2236815645294624
3227316064845329
44779371322096241
55088440025007744
66082498036006724
∑  244488203791108640266

∑xy = 20379

∑x = 244

∑y = 488

∑x² = 11086

∑y² = 40266

n = 6.

Put all the values in the Pearson’s correlation coefficient formula:

[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}        [/Tex]                                                    

R = 6(20379) – (244)(488) / √[6(11086)-(244)²][6(40266)-(488)² ]                                                 

R = 3202 / √[6980][3452]        

R = 3202/4972.238

R = 0.6439

It shows that the relationship between the variables of the data is a strong positive relationship.

Problem 2: Calculate the correlation coefficient from the following table:

SUBJECT   AGE (X)Weight (Y)
14099
22579
32269
45489

Solution:

Make a table from the given data and add three more columns of XY,  X², and Y².

SUBJECT AGE (X) Weight (Y)   XY X²  Y²
14099396016009801
2257919756256241
3226915184844761
45489480629167921
∑ 15133612259562528724

∑xy = 12258

∑x = 151

∑y = 336

∑x² = 5625

∑y² 28724

n = 4

Put all the values in the Pearson’s correlation coefficient formula:

[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}        [/Tex]     

R = 4(12258) – (151)(336) / √[4(5625)-(151)²][4(28724)-(336)²]        

R = -1704 / √[-301][-2000]        

R=-1704/775.886

R=-2.1961

It shows that the relationship between the variables of the data is a very strong negative relationship.

Problem 3:  Calculate the correlation coefficient for the following data:

X = 7,9,14 and Y = 17,19,21

Solution:

Given variables are,

X = 7,9,14

and,

Y = 17,19,21

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula.

XYXY X² Y²
7171194936
91917181361
1421294196441
∑ 30∑ 57∑ 584∑ 326∑ 838

∑xy = 584

∑x = 30

∑y = 57

∑x² = 326

∑y² = 838

n = 3

Put all the values in the Pearson’s correlation coefficient formula:

[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}        [/Tex]      

R = 3(584) – (30)(57) / √[3(326)-(30)²][3(838)-(57)²]        

R = 42 / √[78][-735]        

R = 42/-239.43

R = -0.1754

It shows that the relationship between the variables of the data is negligible relationship

Problem 4: Calculate the correlation coefficient for the following data:

X = 21, 31, 25, 40, 47, 38 and Y = 70,55,60,78,66,80

Solution:

Given variables are,

X = 21,31,25,40,47,38

And,

Y = 70,55,60,78,66,80

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula.

XYXY  X²   Y²
217014704414900
315517059613025
256015006253600
4078312016006094
4766310222094356
3880304014446400
∑202∑409∑13937∑7280∑28265

 ∑xy = 13937

∑x = 202

∑y = 409

∑x² = 7280

∑y² = 28265

n = 6

Put all the values in the Pearson’s correlation coefficient formula:

[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}        [/Tex]     

R = 6(13937) – (202)(409) / √[6(7280) – (202)²][6(28265) – (409)²]        

R = 1004 /√[2876][2909]        

R = 1004 / 2892.452938

R = 0.3471

It shows that the relationship between the variables of the data is a moderate positive relationship.

Problem 5: Calculate the correlation coefficient for the following data?

X = 5 ,9 ,14, 16 and Y = 6, 10, 16, 20 .

Solution:

Given variables are,

X = 5 ,9 ,14, 16

And

Y = 6, 10, 16, 20.

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula add all the values in the columns to get the values used in the formula

XYXY
56302536
9109081100
1416224196256
1620320256400
∑44∑52∑664∑558∑792

∑xy = 664

∑x = 44

∑y = 52

∑x² = 558

∑y² = 792

n = 4

Put all the values in the Pearson’s correlation coefficient formula:

[Tex] R= n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²  [/Tex]

R = 4(664) – (44)(52) / √[4(558) – (44)²][4(792) – (52)²]

R = 368 / √[296][464]

R = 368/370.599

R = 0.9930

It shows that the relationship between the variables of the data is a very strong positive relationship.

Problem 6: Calculate the correlation coefficient for the following data:

X = 10, 13, 15 ,17 ,19 and Y = 5,10,15,20,25.

Solution:

Given variables are,

X = 10, 13, 15 ,17 ,19 and Y = 5, 10, 15, 20, 25.

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also  add all the values in the columns to get the values used in formula,

XYXY
1055010025
1310130169100
1515225225225
1720340340400
1925475475625
∑74∑75∑1103∑1144∑1375

∑xy = 1103

∑x = 74

∑y = 75

∑x² = 1144

∑y² = 1375

n = 5

Put all the values in the Pearson’s correlation coefficient formula:

[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}       [/Tex]

R = 5(1103) – (74)(75) / √ [5(1144) – (74)²][5(1375) – (75)²]

R = -35 / √[244][1250]  

R = -35/552.26

R = 0.0633

It shows that the relationship between the variables of the data is a negligible relationship.

Problems 7: Calculate the correlation coefficient for the following data:

X = 12, 10, 42, 27, 35, 56 and Y = 13, 15, 56, 34, 65, 26

Solution:

Given variables are,

X = 12, 10, 42, 27, 35, 56 and Y = 13, 15, 56, 34, 65, 26

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula

XYXY
1213156144169
1015150100225
4256235217643136
27349187291156
3565227512254225
562614563136676
∑182∑209∑7307∑7098∑9587

∑xy = 7307

∑x = 182

∑y = 209

∑x² = 7098

∑y² = 9587

n = 6

Put all the values in the Pearson’s correlation coefficient formula:

[Tex] R= \frac{n(∑xy) – (∑x)(∑y)}{\sqrt{[n∑x²-(∑x)²][n∑y²-(∑y)²}}       [/Tex]

R = 6(7307) – (182)(209) / √ {[6(7098) – (182)²][6(9587)-(209)²]}

R = 5804 / √[9464][13841] 

R = 5804/11445.139

R = 0.5071

It shows that the relationship between the variables of the data is a strong positive relationship.

Summary – Correlation Coefficient Formula

The correlation coefficient serves as a statistical tool to assess the relationship between two variables in a dataset. Represented by the symbol rrr, its value ranges from -1 to 1, indicating the strength and direction of the linear association. A correlation of 1 signifies a perfect positive linear relationship, while -1 indicates a perfect negative linear relationship. A value of 0 implies no linear relationship. The formula to calculate the correlation coefficient involves the number of data points, the sum of products of corresponding values of the variables, and their sums and squares. This coefficient aids in understanding the extent to which one variable can predict the other, providing valuable insights in various fields including economics, social sciences, and engineering.

FAQs on Correlation Coefficient Formula

How to calculate a correlation coefficient?

Correlation Coefficient is covariance divided by the product of the standard deviations of the given variable.

What is Correlation Calculated for?

Statistical measure of the relationship between two variables is called Correlation.

What is an R value in statistics?

R in statistis is called Pearson correlation coefficient and is a measure of any linear trend between two variables.



Contact Us