Get unique values from a column in Pandas DataFrame
The unique() function removes all duplicate values on a column and returns a single value for multiple same values. In this article, we will discuss how we can get unique values from a column in Pandas DataFrame.
Creating a Pandas Dataframe with Duplicate Elements
Create a sample Pandas dataframe with a dictionary of lists, say columns names are A, B, C, D, and E with duplicate elements.
Python3
# Import pandas package import pandas as pd # create a dictionary with five fields each data = { 'A' : [ 'A1' , 'A2' , 'A3' , 'A4' , 'A5' ], 'B' : [ 'B1' , 'B2' , 'B3' , 'B4' , 'B4' ], 'C' : [ 'C1' , 'C2' , 'C3' , 'C3' , 'C3' ], 'D' : [ 'D1' , 'D2' , 'D2' , 'D2' , 'D2' ], 'E' : [ 'E1' , 'E1' , 'E1' , 'E1' , 'E1' ]} # Convert the dictionary into DataFrame df = pd.DataFrame(data) |
Get unique values from a column in Pandas DataFrame
Below are some examples by which we can get the unique values of a column in this dataframe.
- Get the Unique Values of ‘B’ Column
- Get the Unique Values of ‘E’ Column
- Get Number of Unique Values in a Column
- Using set() to Eliminate Duplicate Values from a Column
- Using pandas.concat() and Unique() Methods
- Using Series.drop_duplicates()
Get the Unique Values of ‘B’ Column
In this example, we are retrieving and printing the unique values from the ‘B’ column using the unique()
method. The resulting unique values are ['B1', 'B2', 'B3', 'B4']
.
Python3
# Import pandas package import pandas as pd # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Get the unique values of 'B' column df.B.unique() |
Output
array(['B1', 'B2', 'B3', 'B4'], dtype=object)
Get the Unique Values of Pandas in ‘E’ Column
In this example, we create a pandas DataFrame from a dictionary and then retrieves the unique values from the ‘E’ column using the unique()
method. The resulting unique values are ['E1']
.
Python3
# Import pandas package import pandas as pd # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Get the unique values of 'E' column df.E.unique() |
Output
array(['E1'], dtype=object)
Get Number of Unique Values in a Column
In this example, we create a pandas DataFrame from a dictionary and then calculates and prints the number of unique values in the ‘C’ column, excluding NaN values. The result is 3, indicating there are three unique values in column ‘C’.
Python3
# Import pandas package import pandas as pd # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Get number of unique values in column 'C' df.C.nunique(dropna = True ) |
Output
3
Eliminate Duplicate Values from a Column using set()
In this example, we create a pandas DataFrame from a dictionary and then uses the set()
function to extract unique values from column ‘C’, eliminating duplicates. The resulting set, {'C1', 'C2', 'C3'}
, represents the unique values in column ‘C’.
Python3
# Import pandas package import pandas as pd # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Use set() to eliminate duplicate values in column 'C' unique_values_set = set (df[ 'C' ]) # Print the unique values print (unique_values_set) |
Output
{'C1', 'C2', 'C3'}
Using pandas.concat() and Unique() Methods
In this example, we create a pandas DataFrame from a dictionary and then concatenates unique values from all columns using pd.concat()
. The resulting NumPy array, when printed, displays all unique values from columns ‘A’ to ‘E’.
Python3
# Import pandas package import pandas as pd # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Use pd.concat() to concatenate all columns and then apply unique() unique_values_all_columns = pd.concat([df[col].unique() for col in df.columns]) # Print the unique values print (unique_values_all_columns) |
Output
['A1' 'A2' 'A3' 'A4' 'A5' 'B1' 'B2' 'B3' 'B4' 'C1' 'C2' 'C3' 'D1' 'D2' 'E1']
Using Series.drop_duplicates()
In this example, we create a pandas DataFrame from a dictionary and removes duplicates from columns ‘A’ and ‘D’ using the drop_duplicates()
method. The resulting DataFrame, when printed, displays the unique values in columns ‘A’ and ‘D’, with NaN values where duplicates were removed from ‘D’.
Python3
# Import pandas package import pandas as pd # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Use drop_duplicates() to remove duplicates from columns 'A' and 'D' df[ 'A' ] = df[ 'A' ].drop_duplicates() df[ 'D' ] = df[ 'D' ].drop_duplicates() # Print the DataFrame after removing duplicates from columns 'A' and 'D' print (df) |
Output
A B C D E 0 A1 B1 C1 D1 E1 1 A2 B2 C2 D2 E1 2 A3 B3 C3 NaN E1 3 A4 B4 C3 NaN E1 4 A5 B4 C3 NaN E1
Contact Us