Merge Function In R
The merge() function in R is a powerful tool for combining data frames based on common columns or keys. It allows you to perform database-style merges, similar to SQL joins, to merge data from multiple sources into a single data frame. In this article, we’ll explore the merge() function in detail, discussing its syntax, parameters, and examples to demonstrate its usage.
The basic syntax of the merge() function is as follows:
merge(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE, ...)
- x, y: The data frames to be merged.
- by: A character vector specifying the variables to merge by. If NULL, the function will merge by common column names.
- by.x, by.y: Additional specifications for column names to merge by in x and y, respectively. These parameters can be used when the column names in x and y differ.
- all: Logical; if TRUE, it performs a full outer join, retaining all rows from both x and y. If FALSE, it performs an inner join, retaining only the rows with matching keys.
Let’s illustrate the usage of the merge() function with an example
# Create two sample data frames
df1 <- data.frame(ID = c(1, 2, 3), Name = c("Johny", "Ali", "Boby"),
Score = c(80, 85, 90))
df2 <- data.frame(ID = c(2, 3, 4), Department = c("IT", "Finance", "HR"))
# Print the original data frames
print("Original Data Frame 1:")
print(df1)
print("Original Data Frame 2:")
print(df2)
Output:
[1] "Original Data Frame 1:"
ID Name Score
1 1 Johny 80
2 2 Ali 85
3 3 Boby 90
[1] "Original Data Frame 2:"
ID Department
1 2 IT
2 3 Finance
3 4 HR
Now, let’s merge these data frames based on the common column “ID”:
# Merge data frames based on the common column "ID"
merged_df <- merge(df1, df2, by = "ID", all = TRUE)
# Print the merged data frame
print("Merged Data Frame:")
print(merged_df)
Output:
[1] "Merged Data Frame:"
ID Name Score Department
1 1 Johny 80 <NA>
2 2 Ali 85 IT
3 3 Boby 90 Finance
4 4 <NA> NA HR
In the merged data frame, each row represents a unique combination of data from both input data frames. The all = TRUE parameter ensures that all rows from both data frames are retained, with missing values (NA) filled in for non-matching rows.
Now we demonstrating the merge()
function in R using multiple data frames with different merge specifications.
# Sample data frames
df1 <- data.frame(ID = 1:5, Name = c("Anurag", "Shivang", "Vipul", "Jayesh", "Pratham"))
df2 <- data.frame(ID = c(2, 4, 6), Score = c(85, 92, 78))
df3 <- data.frame(ID = c(1, 2, 3), Age = c(25, 30, 35))
# Print original data frames
print("Original data frames:")
print(df1)
print(df2)
print(df3)
# Perform inner join on 'ID' column
merged_inner <- merge(x = df1, y = df2, by = "ID", all = FALSE)
print("Inner join:")
print(merged_inner)
# Perform left join on 'ID' column
merged_left <- merge(x = df1, y = df2, by = "ID", all.x = TRUE)
print("Left join:")
print(merged_left)
# Perform outer join on 'ID' column
merged_outer <- merge(x = df1, y = df2, by = "ID", all = TRUE)
print("Outer join:")
print(merged_outer)
# Merge using multiple columns
merged_multi <- merge(x = df1, y = df3, by = c("ID"), all = TRUE)
print("Merge with multiple columns:")
print(merged_multi)
Output:
[1] "Original data frames:"
ID Name
1 1 Anurag
2 2 Shivang
3 3 Vipul
4 4 Jayesh
5 5 Pratham
ID Score
1 2 85
2 4 92
3 6 78
ID Age
1 1 25
2 2 30
3 3 35
[1] "Inner join:"
ID Name Score
1 2 Shivang 85
2 4 Jayesh 92
[1] "Left join:"
ID Name Score
1 1 Anurag NA
2 2 Shivang 85
3 3 Vipul NA
4 4 Jayesh 92
5 5 Pratham NA
[1] "Outer join:"
ID Name Score
1 1 Anurag NA
2 2 Shivang 85
3 3 Vipul NA
4 4 Jayesh 92
5 5 Pratham NA
6 6 <NA> 78
[1] "Merge with multiple columns:"
ID Name Age
1 1 Anurag 25
2 2 Shivang 30
3 3 Vipul 35
4 4 Jayesh NA
5 5 Pratham NA
df1
, df2
, and df3
are three sample data frames. We demonstrate different types of merges: inner, left, and outer joins using the merge()
function.
We also show how to merge on multiple columns by specifying a vector of column names to the by
parameter.
Merge Function In R
In this article, we will discuss the Merge Function and how it works in the R Programming Language.
Contact Us