How to find duplicate values in a list in R
In this article, we will see how to find duplicate values in a list in the R Programming Language in different scenarios.
Finding duplicate values in a List
In R, the duplicated() function is used to find the duplicate values present in the R objects. This function determines which elements of a List are duplicates and returns a logical vector (Holds TRUE/FALSE values) indicating which elements are duplicates. TRUE is returned if the element already exists. Otherwise, FALSE will be returned.
Syntax:
duplicated(List_name)
Here, List_name is the input list.
Let’s have a list with 10 values and find the duplicate values.
# Create a List
List_data =list(1,2,3,4,5,6,7,5,4,3)
print(List_data)
# Find duplicates in the above List
print(duplicated(List_data))
Output:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
[[5]]
[1] 5
[[6]]
[1] 6
[[7]]
[1] 7
[[8]]
[1] 5
[[9]]
[1] 4
[[10]]
[1] 3
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE
We can see that last three elements in the List are duplicated. So TRUE is returned for them.
Let’s have a list that hold 2 lists and find duplicates in each of the list separately.
# Create a List with 2 lists
List_data =list(list1=list(100,200,300,300,300),
list2=list("Java","HTML","PHP","JSP","Statistics"))
print(List_data)
# Find duplicates in list1 from List_data
print(duplicated(List_data$list1))
# Find duplicates in list2 from List_data
print(duplicated(List_data$list2))
Output:
$list1
$list1[[1]]
[1] 100
$list1[[2]]
[1] 200
$list1[[3]]
[1] 300
$list1[[4]]
[1] 300
$list1[[5]]
[1] 300
$list2
$list2[[1]]
[1] "Java"
$list2[[2]]
[1] "HTML"
$list2[[3]]
[1] "PHP"
$list2[[4]]
[1] "JSP"
$list2[[5]]
[1] "Statistics"
[1] FALSE FALSE FALSE TRUE TRUE
[1] FALSE FALSE FALSE FALSE FALSE
There are two duplicates in list1.
Let’s create a List having three vectors and find the duplicates in each vector.
# Create a List with 3 vectors
List_data =list(Id=c(1,2,3,4,5,4,5),Subject=c("Java","HTML","HTML","Python"),
Marks=c(100,89,78,69,80))
print(List_data)
# Find duplicates in the Id
duplicated(List_data$Id)
# Find duplicates in the Subject
duplicated(List_data$Subject)
# Find duplicates in the Marks
duplicated(List_data$Marks)
Output:
$Id
[1] 1 2 3 4 5 4 5
$Subject
[1] "Java" "HTML" "HTML" "Python"
$Marks
[1] 100 89 78 69 80
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE
[1] FALSE FALSE TRUE FALSE
[1] FALSE FALSE FALSE FALSE FALSE
- Id holds two duplicate values i.e 4 and 5
- Subject holds one duplicate value i.e “HTML”
- There are no duplicates in the Marks vector.
Let’s create a List having 2 vectors and return total number of duplicate elements. To do this we need to use the sum() function and pass the duplicated() function as a parameter to it.
# Create a List with 2 vectors
List_data =list(Id=c(1,2,3,4,5,4,5),Subject=c("Java","HTML","HTML","Python"))
print(List_data)
# Find duplicates in the Id
sum(duplicated(List_data$Id))
# Find duplicates in the Subject
sum(duplicated(List_data$Subject))
Output:
$Id
[1] 1 2 3 4 5 4 5
$Subject
[1] "Java" "HTML" "HTML" "Python"
[1] 2
[1] 1
There are 2 duplicates in the Id vector and one duplicate in the Subject vector.
Conclusion
In conclusion, identifying duplicate values in a list in R is essential for data cleaning and quality assurance. By utilizing various methods such as the duplicated()
function we can efficiently detect and handle duplicate values.
Contact Us