where() method
Where: where is similar to filter() function that is used to return the dataframe based on the given condition by removing the rows in the dataframe or by extracting the particular rows or columns from the dataframe. It can take a condition and returns the dataframe.
where(dataframe.column condition)
Example 1: Python program to filter on multiple columns
Python3
# select dataframe where ID less than # 3 or name is sridevi and comapny 1 dataframe.where((dataframe. ID < 3 ) | ( (dataframe.NAME = = 'sridevi' ) & (dataframe.Company = = 'company 1' ))).show() |
Output:
Python PySpark – DataFrame filter on multiple columns
In this article, we are going to filter the dataframe on multiple columns by using filter() and where() function in Pyspark in Python.
Creating Dataframe for demonestration:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of employee data data = [[ 1 , "sravan" , "company 1" ], [ 2 , "ojaswi" , "company 1" ], [ 3 , "rohith" , "company 2" ], [ 4 , "sridevi" , "company 1" ], [ 1 , "sravan" , "company 1" ], [ 4 , "sridevi" , "company 1" ]] # specify column names columns = [ 'ID' , 'NAME' , 'Company' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # display dataframe dataframe.show() |
Output:
Contact Us