What is the Gather Function?

The gather() function is used to transform wide datasets into long datasets, making it easier to work with and analyze the data. It takes multiple columns and collapses them into key-value pairs, resulting in a dataset with fewer columns and more rows.

The syntax of the gather() function is as follows:

Syntax:

gather(data, key, value, …, na.rm = FALSE, convert = FALSE)

  • data: The input data frame.
  • key: The name of the new key column that will contain the column names of the original dataset.
  • value: The name of the new value column that will contain the values from the original dataset.
  • …: Columns to gather. You can specify multiple columns separated by commas.
  • na.rm: Boolean value indicating whether to remove NA values. Default is FALSE.
  • convert: Boolean value indicating whether to automatically convert character columns to factors. Default is FALSE.

Let’s walk through some examples to better understand how the gather() function works.

R
# Load the tidyr package
library(tidyr)

# Create sample wide dataset
wide_data <- data.frame(
  ID = 1:3,
  A = c(10, 20, 30),
  B = c(15, 25, 35),
  C = c(12, 22, 32)
)

# Print the wide dataset
print(wide_data)

# Gather the data into long format
long_data <- gather(wide_data, key = "Variable", value = "Value", -ID)

# Print the long format data
print(long_data)

Output:

  ID  A  B  C
1 1 10 15 12
2 2 20 25 22
3 3 30 35 32

ID Variable Value
1 1 A 10
2 2 A 20
3 3 A 30
4 1 B 15
5 2 B 25
6 3 B 35
7 1 C 12
8 2 C 22
9 3 C 32

The gather function has effectively transformed the data from wide format to long format, making it easier to perform certain types of analyses or visualizations.

Gathering Multiple Variables

Now we will use gather function and Gathering Multiple Variables in R Programming Language.

R
# Create sample wide dataset
wide_data <- data.frame(
  ID = 1:3,
  Age_2019 = c(25, 30, 35),
  Age_2020 = c(26, 31, 36),
  Age_2021 = c(27, 32, 37)
)
wide_data
# Gather the data into long format
long_data <- gather(wide_data, key = "Year", value = "Age", -ID)

# Print the long format data
print(long_data)

Output:

  ID Age_2019 Age_2020 Age_2021
1 1 25 26 27
2 2 30 31 32
3 3 35 36 37

ID Year Age
1 1 Age_2019 25
2 2 Age_2019 30
3 3 Age_2019 35
4 1 Age_2020 26
5 2 Age_2020 31
6 3 Age_2020 36
7 1 Age_2021 27
8 2 Age_2021 32
9 3 Age_2021 37

In this example gathers multiple variables (Age_2019, Age_2020, Age_2021) into a long format, where the “Year” column contains the year information and the “Age” column contains the corresponding ages.

Gathering Categorical Data

Now we will Gathering Categorical Data with the help of gather function in R Programming Language.

R
# Create sample wide dataset
wide_data <- data.frame(
  ID = 1:3,
  Gender = c("Male", "Female", "Male"),
  Ethnicity = c("Asian", "Caucasian", "African American")
)
wide_data
# Gather the data into long format
long_data <- gather(wide_data, key = "Category", value = "Value", -ID)

# Print the long format data
print(long_data)

Output:

  ID Gender        Ethnicity
1 1 Male Asian
2 2 Female Caucasian
3 3 Male African American

ID Category Value
1 1 Gender Male
2 2 Gender Female
3 3 Gender Male
4 1 Ethnicity Asian
5 2 Ethnicity Caucasian
6 3 Ethnicity African American

In This example gathers categorical variables (Gender, Ethnicity) into a long format, where the “Category” column contains the variable names and the “Value” column contains the corresponding values.

How to Use Gather Function in R

In data analysis and manipulation, it’s often necessary to reshape datasets for better comprehension or analysis. The gather() function in the R Programming Language part of the tidyr package, is a powerful tool for reshaping data from wide to long format. This article will explore the gather() function in detail, providing explanations and examples.

Similar Reads

What is the Gather Function?

The gather() function is used to transform wide datasets into long datasets, making it easier to work with and analyze the data. It takes multiple columns and collapses them into key-value pairs, resulting in a dataset with fewer columns and more rows....

Conclusion

The gather() function in R is a powerful tool for reshaping wide datasets into long format, making them easier to analyze and work with. By understanding its syntax and usage through examples, you can effectively manipulate your data to suit your analytical needs....

Contact Us