Comparing the means of more than two groups

There are mainly two techniques are used to compare the one-sample mean to a standard known mean. These two techniques are:

  • Analysis of Variance (ANOVA)
    • One way ANOVA
    • Two way ANOVA
    • MANOVA Test
  • Kruskal–Wallis Test

One way ANOVA

The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is an extension of independent two-samples t-test for comparing means in a situation where there are more than two groups. In one-way ANOVA, the data is organized into several groups base on one single grouping variable.

Implementation in R:

For performing the one-way analysis of variance (ANOVA) in R, use the function aov(). The function summary.aov() is used to summarize the analysis of the variance model. The syntax for the function is given below.

Syntax: aov(formula, data = NULL)

Parameters:

  • formula: A formula specifying the model.
  • data: A data frame in which the variables specified in the formula will be found

Example:

One way ANOVA test is performed using mtcars dataset which comes preinstalled with dplyr package between disp attribute, a continuous attribute, and gear attribute, a categorical attribute.

R




# R program to illustrate
# one way ANOVA test
 
# Loading the package
library(dplyr)
 
# Calculate test statistics using aov function
mtcars_aov <- aov(mtcars $ disp ~ factor(mtcars $ gear))
print(summary(mtcars_aov))


Output:

Df Sum Sq Mean Sq F value   Pr(>F)    

factor(mtcars$gear)  2 280221  140110   20.73 2.56e-06 ***

Residuals           29 195964    6757                     

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The summary shows that the gear attribute is very significant to displacement(Three stars denoting it). Also, P value less than 0.05, so it proves that gear is significant to displacement i.e related to each other, and we reject the Null Hypothesis.

Two way ANOVA

Two-way ANOVA test is used to evaluate simultaneously the effect of two grouping variables (A and B) on a response variable. It takes two categorical groups into consideration.

Implementation in R:

For performing the two-way analysis of variance (ANOVA) in R, also use the function aov(). The function summary.aov() is used to summarize the analysis of variance model. The syntax for the function is given below.

Syntax: aov(formula, data = NULL)

Parameters:

  • formula: A formula specifying the model.
  • data: A data frame in which the variables specified in the formula will be found

Example: Two way ANOVA test is performed using mtcars dataset which comes preinstalled with dplyr package between disp attribute, a continuous attribute and gear attribute, a categorical attribute, am attribute, a categorical attribute.

R




# R program to illustrate
# two way ANOVA test
 
# Loading the package
library(dplyr)
 
# Calculate test statistics using aov function
mtcars_aov2 = aov(mtcars $ disp ~ factor(mtcars $ gear) *
                  factor(mtcars $ am))
print(summary(mtcars_aov2))


 Output:

    Df Sum Sq Mean Sq F value   Pr(>F)    

factor(mtcars$gear)  2 280221  140110  20.695 3.03e-06 ***

factor(mtcars$am)    1   6399    6399   0.945    0.339    

Residuals           28 189565    6770                     

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The summary shows that gear attribute is very significant to displacement(Three stars denoting it) and am attribute is not much significant to displacement. P-value of gear is less than 0.05, so it proves that gear is significant to displacement i.e related to each other. P-value of am is greater than 0.05, am is not significant to displacement i.e not related to each other.

MANOVA Test

Multivariate analysis of variance (MANOVA) is simply an ANOVA (Analysis of variance) with several dependent variables. It is a continuation of the ANOVA. In an ANOVA, we test for statistical differences on one continuous dependent variable by an independent grouping variable. The MANOVA continues this analysis by taking multiple continuous dependent variables and bundles them collectively into a weighted linear composite variable. The MANOVA compares whether or not the newly created combination varies by the different levels, or groups, of the independent variable. 

Implementation in R:

R provides a method manova() to perform the MANOVA test. The class “manova” differs from class “aov” in selecting a different summary method. The function manova() calls aov and then add class “manova” to the result object for each stratum.

Syntax: manova(formula, data = NULL, projections = FALSE, qr = TRUE, contrasts = NULL, …)

Parameters: 

  • formula: A formula specifying the model.
  • data: A data frame in which the variables specified in the formula will be found. If missing, the variables are searched for in the standard way.
  • projections: Logical flag
  • qr: Logical flag
  • contrasts: A list of contrasts to be used for some of the factors in the formula. 
    …: Arguments to be passed to lm, such as subset or na.action

Example: To perform the MANOVA test in R let’s take iris data set. 

R




# R program to illustrate
# MANOVA test
 
# Import required library
library(dplyr)
 
# Taking iris data set
myData = iris
 
# Show a random sample
set.seed(1234)
dplyr::sample_n(myData, 10)


 Output:

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1           5.5         2.5          4.0         1.3 versicolor
2           5.6         2.5          3.9         1.1 versicolor
3           6.0         2.9          4.5         1.5 versicolor
4           6.4         3.2          5.3         2.3  virginica
5           4.3         3.0          1.1         0.1     setosa
6           7.2         3.2          6.0         1.8  virginica
7           5.9         3.0          4.2         1.5 versicolor
8           4.6         3.1          1.5         0.2     setosa
9           7.9         3.8          6.4         2.0  virginica
10          5.1         3.4          1.5         0.2     setosa

To know if there is any important difference, in sepal and petal length, between the different species then perform MANOVA test. Hence, the function manova() can be used as follows. 

R




# Taking two dependent variable
sepal = iris$Sepal.Length
petal = iris$Petal.Length
 
# MANOVA test
result = manova(cbind(Sepal.Length, Petal.Length) ~ Species,
                      data = iris)
summary(result)


 Output:

   Df Pillai approx F num Df den Df    Pr(>F)    

Species     2 0.9885   71.829      4    294 < 2.2e-16 ***

Residuals 147                                            

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

From the output above, it can be seen that the two variables are highly significantly different among Species.

Kruskal–Wallis test 

The Kruskal–Wallis test is a rank-based test that is similar to the Mann–Whitney U test but can be applied to one-way data with more than two groups. It is a non-parametric alternative to the one-way ANOVA test, which extends the two-samples Wilcoxon test. A group of data samples is independent if they come from unrelated populations and the samples do not affect each other. Using the Kruskal-Wallis Test, it can be decided whether the population distributions are similar without assuming them to follow the normal distribution

Implementation in R:

R provides a method kruskal.test() which is available in the stats package to perform a Kruskal-Wallis rank-sum test.

Syntax: kruskal.test(x, g, formula, data, subset, na.action, …)

Parameters:

  • x: a numeric vector of data values, or a list of numeric data vectors.
  • g: a vector or factor object giving the group for the corresponding elements of x
  • formula: a formula of the form response ~ group where response gives the data values and group a vector or factor of the corresponding groups.
  • data: an optional matrix or data frame containing the variables in the formula .
  • subset: an optional vector specifying a subset of observations to be used.
  • na.action: a function which indicates what should happen when the data contain NA
  • …: further arguments to be passed to or from methods.

Example: Let’s use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under control and two different treatment conditions. 

R




# Preparing the data set
# to perform Kruskal-Wallis Test
 
# Taking the PlantGrowth data set
myData = PlantGrowth
print(myData)
 
# Show the group levels
print(levels(myData$group))


Output:

   weight group
1    4.17  ctrl
2    5.58  ctrl
3    5.18  ctrl
4    6.11  ctrl
5    4.50  ctrl
6    4.61  ctrl
7    5.17  ctrl
8    4.53  ctrl
9    5.33  ctrl
10   5.14  ctrl
11   4.81  trt1
12   4.17  trt1
13   4.41  trt1
14   3.59  trt1
15   5.87  trt1
16   3.83  trt1
17   6.03  trt1
18   4.89  trt1
19   4.32  trt1
20   4.69  trt1
21   6.31  trt2
22   5.12  trt2
23   5.54  trt2
24   5.50  trt2
25   5.37  trt2
26   5.29  trt2
27   4.92  trt2
28   6.15  trt2
29   5.80  trt2
30   5.26  trt2
[1] "ctrl" "trt1" "trt2"

Here the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically. The problem statement is we want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions. And the test can be performed using the function kruskal.test() as given below.

R




# R program to illustrate
# Kruskal-Wallis Test
 
# Taking the PlantGrowth data set
myData = PlantGrowth
 
# Performing Kruskal-Wallis test
result = kruskal.test(weight ~ group,
                      data = myData)
print(result)


 Output:

  Kruskal-Wallis rank sum test

data:  weight by group

Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

As the p-value is less than the significance level 0.05, it can be concluded that there are significant differences between the treatment groups.



Comparing Means in R Programming

There are many cases in data analysis where you’ll want to compare means for two populations or samples and which technique you should use depends on what type of data you have and how that data is grouped together. The comparison of means tests helps to determine if your groups have similar means. So this article contains statistical tests to use for comparing means in R programming. These tests include:

Similar Reads

Comparing Means in R Programming

So as we have discussed before various techniques are used depending on what type of data we have and how the data is grouped together. So let’ discuss one by one techniques depending on the different types of data....

Comparing the means of one-sample data

There are mainly two techniques used to compare the one-sample mean to a standard known mean. These two techniques are:...

Comparing the means of paired samples

...

Comparing the means of more than two groups

...

Contact Us