Filter or subsetting rows in R using Dplyr ❤️

In this article, we are going to filter the rows from dataframe in R programming language using Dplyr package.

Dataframe in use:

To filter or subset row we are going to use the filter() function.

Syntax:

filter(dataframe,condition)

Here, dataframe is the input dataframe, and condition is used to filter the data in the dataframe

Example: R program to filter the data frame

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
#display actual  dataframe 
print(data) 
print("==========================") 
  
#filter dataframe with department is sales 
print(filter(data,department=="sales"))

Output:

We are going to use the filter function to filter the rows. Here we have to specify the condition in the filter function.

Syntax:

filter(dataframe,condition1condition2,.condition n)

Here, dataframe is the input dataframe and conditions is used to filter the data in the dataframe

Example: R program to filter multiple rows

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# filter dataframe with department is sales and  
# salary is greater than 27000 
print(filter(data,department=="sales" & salary >27000))

Output:

Example: Filter rows by OR operator

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# filter dataframe with department is IT or salary  
# is greater than 27000 
print(filter(data,department=="IT" |  salary >27000))

Output:

Example: R program to filter using and, or

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# filter dataframe with department is sales  
# and salary is greater than 27000 or salary 
# less than 5000 
print(filter(data,department=="sales" & salary >27000 | salary<5000))

Output:

This function is used to get top n rows from the dataframe.

Syntax:

dataframe %>% slice_head(n)

where, dataframe is the input dataframe, %>% is the operator (pipe operator) that loads the dataframe and n is the number of rows to be displayed.

Example: R program that used slice_head() to filter rows

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# display top 3 values with slice_head 
data %>% slice_head(n=3) 
print("==========================") 
  
# display top 5 values with slice_head 
data %>% slice_head(n=5) 
print("==========================") 
  
# display top 1 value  with slice_head 
data %>% slice_head(n=1)

Output:

This function is used to get last n rows from the dataframe

Syntax:

dataframe %>% slice_tail(n)

Where, dataframe is the input dataframe, %>% is the operator (pipe operator) that loads the dataframe and n is the number of rows to be displayed from last

Example: R program to filter last rows by using slice_tail() method

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# display last 3 values with slice_tail 
data %>% slice_tail(n=3) 
print("==========================") 
  
  
# display last 5 values with slice_tail 
data %>% slice_tail(n=5) 
print("==========================") 
  
# display last 1 value  with slice_tail 
data %>% slice_tail(n=1)

Output:

This function is used to get top n rows.

Syntax:

data %>% top_n(n=5)

Example: R program that filter rows using top_n() function

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00,78900.00, 
                         25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# display last 3 values with top_n 
data %>% top_n(n=3) 
print("==========================") 
  
# display last 5 values with top_n 
data %>% top_n(n=5) 
print("==========================") 
  
# display last 1 value  with top_n 
data %>% top_n(n=1)

Output:

Here, we are going to filter rows using the slice_sample() function, this will return sample n rows randomly

Syntax:

slice_sample(n)

Example: R program to filter rows using slice_sample () function

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# display last 3 values with slice_sample 
data %>% slice_sample(n=3) 
print("==========================") 
  
# display last 5 values with slice_sample 
data %>% slice_sample(n=5) 
print("==========================") 
  
# display last 1 value  with slice_sample 
data %>% slice_sample(n=1)

Output:

This function returns the maximum n rows of the dataframe based on a column

Syntax:

dataframe %>% slice_max(column, n )

Where dataframe is the input dataframe, the column is the dataframe column where max rows are returned based on this column and n is the number of maximum rows to be returned

Example: R program to filter using slice_max() function

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# return top 3 maximum rows based on salary  
# column in the dataframe 
print(data %>% slice_max(salary, n = 3)) 
print("==========================") 
  
# return top 5 maximum rows based on department  
# column in the dataframe 
print(data %>% slice_max(department, n = 5)) 
print("==========================")

Output:

This function returns the minimum n rows of the dataframe based on a column

Syntax:

dataframe %>% slice_min(column, n )

Where dataframe is the input dataframe, the column is the dataframe column where max rows are returned based on this column and n is the number of minimum rows to be returned

Example: R program to filter using slice_min()

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# return top 3 minimum rows based on salary  
# column in the dataframe 
print(data %>% slice_min(salary, n = 3)) 
print("==========================") 
  
# return top 5 minimum rows based on department  
# column in the dataframe 
print(data %>% slice_min(department, n = 5)) 
print("==========================")

Output:

The sample_frac() function selects a random n percentage of rows from a data frame (or table). First parameter contains the data frame name, the second parameter tells what percentage of rows to select

Syntax:

(sample_frac(dataframe,n)

Where dataframe is the input dataframe and n is the fraction value

Example: R program to filter data using sample_frac() function

R

# load the package 
library(dplyr) 
  
# create the dataframe with three columns 
# id , department and salary with 8 rows 
data=data.frame(id=c(7058,7059,7060,7089,7072,7078,7093,7034), 
                  
                department=c('IT','sales','finance','IT','finance', 
                             'sales','HR','HR'), 
                  
                salary=c(34500.00,560890.78,67000.78,25000.00, 
                         78900.00,25000.00,45000.00,90000)) 
  
# display actual  dataframe 
print(data) 
print("==========================") 
  
# return 2 rows 
print(sample_frac(data,0.2)) 
print("==========================") 
  
# return 4 rows 
print(sample_frac(data,0.4)) 
print("==========================") 
  
# return 7 rows 
print(sample_frac(data,0.7)) 
print("==========================")

Output:

Filter or subsetting rows in R using Dplyr

Method 1: Subset or filter a row using filter()

R

Method 2: Filter dataframe with multiple conditions

R

R

R

Method 3: Using slice_head() function

R

Method 4: Using slice_tail() function

R

Method 5: Using top_n() function

R

Method 6: Using slice_sample() function

R

Method 7: Using slice_max() function

R

Method 8: Using slice_min() function

R

Method 9: Using sample_frac() function

R

Contact Us