Pair Plot Using Tidyverse in R

Tidyverse package in R is a collection of R packages designed for Exploratory data analysis, visualization, and manipulation in R. It contains R packages namely ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats and many others. The complete list can be obtained as shown in the below code.

R




# Install the tidyverse package
  install.packages("tidyverse")
 
# Load the package
  library(tidyverse)
 
# List all packages in tidyverse
  tidyverse_packages()


Output:

 [1] "broom"         "conflicted"    "cli"           "dbplyr"        "dplyr"        
 [6] "dtplyr"        "forcats"       "ggplot2"       "googledrive"   "googlesheets4"
[11] "haven"         "hms"           "httr"          "jsonlite"      "lubridate"    
[16] "magrittr"      "modelr"        "pillar"        "purrr"         "ragg"         
[21] "readr"         "readxl"        "reprex"        "rlang"         "rstudioapi"   
[26] "rvest"         "stringr"       "tibble"        "tidyr"         "xml2"         
[31] "tidyverse" 

So tidyverse package actually initializes all the needed packages at once instead of having to initialize them in our code one by one which is time taking and inefficient.

Let’s assume a preloaded dataset in R, the chickwts dataset preloaded in R which contains the weight of chickens according to their feeding habit.

R




# Printing the available columns in dataset
print(colnames(chickwts))
 
# Total count of columns
print(paste("Total no. of columns : ", length(colnames(chickwts))))


Output :

[1] "weight" "feed"
[1] "Total no. of columns :  2"

Hence total no. of pair plots created will be 2 if we perform a simple pair-plot using the simple pairs() function in R.

R




# Simple scatter pair plot
 pairs(chickwts)


Output:

Simple scatter pair-plot

Now we will try to perform the same using the tidyverse packages to get a more clean look to the plot.

Step 1: Preparing the Dataset

First, we need to clean our dataset of NA(Not Available) values by removing the rows containing NA in any of the columns. We will use the drop_na() of the tidy package to achieve the same.

R




library(tidyverse)
 
# Storing our chickwts dataset in a different variable
 chickwts_df <- chickwts
 
# Checking for existence of NA values and removing these rows
if (sum(is.na(chickwts_df)) > 0) {
  chickwts_df <- chickwts_df %>% drop_na()
  print("Cleaned the dataset of NA values")
} else {
  print("No NA values found.")
}


Output:

[1] "No NA values found."

The is. na() function produces a matrix consisting of logical values (TRUE or FALSE), where TRUE indicates a missing value in a particular dataset column. The sum() function is used to take a sum of all those logical values and since TRUE corresponds to 1 any missing value will result in a sum > 0. We finally used the drop_na() function to clean our dataset. `%>%` is the pipe operator of dplyr package included in tidyverse to pipe two or more operations together.

Step 2: Create the pair plot

We will use the GGally package which extends the ggplot2 package and provides a function named ggpairs() which is the ggplot2 equivalent of the pairs() the function of base R.  We can see the correlation coefficient between each pairwise combination of variables as well as a density plot for each individual variable and hence its better than pair() function.

R




library(tidyverse)
library(GGally)
 
# Storing our chickwts dataset in a different variable
chickwts_df <- chickwts
 
# Checking for existence of NA values and removing these rows
if (sum(is.na(chickwts_df)) > 0) {
  chickwts_df <- chickwts_df %>% drop_na()
  print("Cleaned the dataset of NA values")
} else {
  print("No NA values found.")
}
 
# pair plot using ggpairs
ggpairs(chickwts_df, columns = c("weight", "feed"))


Output:

Pair plot from scratch with tidyverse

Here inside ggpairs() function we specify the dataset we want to create a pair plot of and the columns to be used in the pair-plot.

It has generated 4 plots namely scatter plot(top left), box plot (top-right), density plot(bottom-left), and histogram(bottom-right).

Now let’s improve the looks
first, we will add colors to the plot by setting the color parameter in the aesthetics of our plot to the `feed` column and then add the `alpha = 0.5` parameter to denote a semi-transparent plot to get a better view of overlapping layers.

R




library(tidyverse)
library(GGally)
 
# Storing our chickwts dataset in a different variable
chickwts_df <- chickwts
 
# Checking for existence of NA values and removing these rows
if (sum(is.na(chickwts_df)) > 0) {
  chickwts_df <- chickwts_df %>% drop_na()
  print("Cleaned the dataset of NA values")
} else {
  print("No NA values found.")
}
 
# pair plot using ggpairs
ggpairs(chickwts_df, columns = c("weight", "feed"), aes(color = feed, alpha = 0.5))


Output :

Pair Plot from scratch with tidyverse

Now finally we will add a legend to our plot and set a theme to the plot.

R




library(tidyverse)
library(GGally)
 
# Storing our chickwts dataset in a different variable
chickwts_df <- chickwts
 
# pair plot using ggpairs
ggpairs(chickwts_df, columns = c("weight", "feed"), aes(color = feed,
              alpha = 0.5), legend = 1) + theme_bw()


Output:

Pair Plot from scratch with tidyverse

Here we added a legend by writing
`legend = 1`
which enables legend to be displayed, the default value is NULL which means no legend. The plot displays a legend containing all parameters mentioned in
aes()
function. We set the plot to the
theme_bw()
function to get a simple theme with deep black borders around the plot.

Pair plot from scratch with tidyverse

In R Programming Language Pairplot is a matrix of plots that are used to show the relationship between each of the pairs of variables in a given dataset.
The pairs() function in R is used to create pair plots specifically scatter plots, the syntax is:-

pairs(dataset)

where dataset parameter is the name of the data frame. The function returns a matrix of scatter plots between each pair of variables in the data frame.

Pairplots can be used to get a sense of the distribution of different variables of the dataset and helps us to identify any potential problems or patterns in a given dataset.

Similar Reads

Pair Plot Using Tidyverse in R

Tidyverse package in R is a collection of R packages designed for Exploratory data analysis, visualization, and manipulation in R. It contains R packages namely ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, forcats and many others. The complete list can be obtained as shown in the below code....

Example 2:

...

Contact Us