How to Use “NOT IN” Filter in Pandas?
The “NOT IN”(∼) filter is a membership operator used to check whether the data is present in DataFrame or not.
Pandas library does not have the direct NOT IN filter in Python, but we can perform the NOT IN filter by negating the isin() operator of Pandas.
In this tutorial, we will provide a step-by-step guide to perform the NOT IN filter in Pandas DataFrame.
Create a Sample DataFrame
Python3
# import pandas module import pandas as pd # create dataframe data1 = pd.DataFrame({ 'name' : [ 'sravan' , 'harsha' , 'jyothika' ], 'subject1' : [ 'python' , 'R' , 'php' ], 'marks' : [ 96 , 89 , 90 ]}, index = [ 0 , 1 , 2 ]) # display data1 |
Output:
Method 1: Use NOT IN Filter in One Column
Those given values are taken from the list.
We are filtering the DataFrame column values that are present in the given list.
Syntax: dataframe[~dataframe[column_name].isin(list)]
where,
- dataframe is the input dataframe
- column_name is the column that is filtered
- list is the list of values to be removed in that column
Example: Using NOT IN filter in one column of a DataFrame.
Python3
# import pandas module import pandas as pd # create dataframe data1 = pd.DataFrame({ 'name' : [ 'sravan' , 'harsha' , 'jyothika' ], 'subject1' : [ 'python' , 'R' , 'php' ], 'marks' : [ 96 , 89 , 90 ]}, index = [ 0 , 1 , 2 ]) # consider a list list1 = [ 'harsha' , 'jyothika' ] # filter in name column print (data1[~data1[ 'name' ].isin(list1)]) print ( "============" ) # consider a list list2 = [ 'R' ] # filter in name column print (data1[~data1[ 'subject1' ].isin(list2)]) print ( "============" ) # consider a list list3 = [ 96 , 89 ] # filter in name column print (data1[~data1[ 'marks' ].isin(list3)]) |
Output:
Method 2: Use NOT IN Filter in Multiple Columns
Now we can filter in more than one column by using any() function. This function will check the value that exists in any given column and columns are given in [[]] separated by a comma.
Syntax: dataframe[~dataframe[[columns]].isin(list).any(axis=1)]
Example: Using NOT IN filter in multiple columns of the DataFrame.
Python3
# import pandas module import pandas as pd # create dataframe data1 = pd.DataFrame({ 'name' : [ 'sravan' , 'harsha' , 'jyothika' ], 'subject1' : [ 'python' , 'R' , 'php' ], 'marks' : [ 96 , 89 , 90 ]}, index = [ 0 , 1 , 2 ]) # consider a list list1 = [ 'harsha' , 'jyothika' , 96 ] # filter in name and marks column print (data1[~data1[[ 'name' , 'marks' ]].isin(list1). any (axis = 1 )]) print ( "============" ) # consider a list list2 = [ 'R' , 'sravan' ] # filter in name and subject1 column print (data1[~data1[[ 'subject1' , 'name' ]].isin(list2). any (axis = 1 )]) |
Output:
Method 3: Use Numpy with NOT IN filter
This is similar to the above functionality.
Syntax: dataframe[~numpy.isin(dataframe[‘column’], list)]
Example: Using NOT IN operator with Numpy constructor
Python3
# import pandas module import numpy as np import pandas as pd # create dataframe data1 = pd.DataFrame({ 'name' : [ 'sravan' , 'harsha' , 'jyothika' ], 'subject1' : [ 'python' , 'R' , 'php' ], 'marks' : [ 96 , 89 , 90 ]}, index = [ 0 , 1 , 2 ]) # consider a list list1 = [ 'harsha' , 'jyothika' , 96 ] # filter in name column data1[~np.isin(data1[ 'name' ], list1)] |
Output:
Conclusion
NOT IN filter allows you to know what values are not present in DataFrame. It is used to check for missing values, conditional data handling, data cleaning, etc.
In this tutorial, we have covered how to use the NOT IN filter in Pandas DataFrame. We have seen using the NOT IN (∼) operator in single and multiple columns of a DataFrame. We have also covered isin() and any() function to perform NOT IN filtering.
Contact Us