Add line for average per group using ggplot2 package in R
In this article, we will discuss how to add a line for average per group in a scatter plot in the R Programming Language.
In the R Language, we can do so by creating a mean vector by using the group_by() and summarise() function. Then we can use that mean vector along with the geom_hline() function of the ggplot2 package to create a line by the mean point colored by the group.
To create a mean vector from the data frame,
Syntax:
mean <- df %>% group_by( <categorical-variable> ) %>% summarise( mean_val = mean( <quantitative-variable> )
Arguments:
- df: determines the data frame to be used.
- <categorical-variable>: determines the variable that is used to divide data into groups.
- <quantitative-variable>: determines the variable whose mean is to be found.
This expression creates a vector with two columns i.e. <categorical-variable> and the mean that stores mean by category. Now, we will use this mean vector with the geom_hline() function to add a horizontal line at the mean/average of data colored by categorical variable.
Syntax:
plot + geom_hline( mean_df, aes( yintercept, col )
Arguments:
- mean_df: determines the data frame that contains mean information.
- yintercept: determines the variable mean column in dataframe.
- col: determines the categorical variable by which line has to be colored.
Example 1:
Here in this example, we have created a scatter plot colored by a categorical variable. Then we have added a line colored by the same variable that goes through the mean of that category of data.
R
# load library tidyverse library (tidyverse) # create dataframe df <- data.frame ( group= factor ( rep ( c ( "category1" , "category2" , "category3" ), each=100)), y= round ( c ( rnorm (100, mean=65, sd=5), rnorm (100, mean=85, sd=5), rnorm (100, mean=105, sd=5))), x= rnorm (300)) # create mean by group mean <- df%>% group_by (group)%>% summarise (mean_val= mean (y)) # create ggplot scatter plot # add horizontal line overlay at mean using geom_hline() ggplot (data = df, aes (x= x, y=y)) + geom_point ( aes (colour = group)) + geom_hline (data= mean, aes (yintercept = mean_val,col=group)) |
Output:
Example 2:
In this example, we have created a scatter plot colored by a categorical variable. Then we have added a line colored by the same variable that goes through the mean of that category of data. We have also added a facet_grid() to convert this plot into a facet plot to better visualize the data through a categorical variable.
R
# load library tidyverse library (tidyverse) # create dataframe df <- data.frame ( group= factor ( rep ( c ( "category1" , "category2" , "category3" ), each=100)), y= round ( c ( rnorm (100, mean=65, sd=5), rnorm (100, mean=55, sd=5), rnorm (100, mean=60, sd=5))), x= rnorm (300)) # create mean by group mean <- df%>% group_by (group)%>% summarise (mean_val= mean (y)) # create ggplot scatter plot # add horizontal line overlay at mean using geom_hline() # divide plot in facet using function facet_grid() ggplot (data = df, aes (x= x, y=y)) + geom_point ( aes (colour = group)) + geom_hline (data= mean, aes (yintercept = mean_val,col=group))+ facet_grid (~group) |
Output:
Contact Us