Python | Pandas DataFrame.set_index()
In this article, we will discuss Pandas DataFrame.set_index() in Python. Python stands out as an excellent language for data analysis, largely due to its outstanding ecosystem of data-centric packages. Among these packages, Pandas plays a significant role in simplifying importing and analyzing data.
Pandas DataFrame.set_index() Syntax
Syntax: DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
Parameters:
- keys: Column name or a list of column names.
- drop: A Boolean value that, if True, drops the column used for the index.
- append: If True, appends the specified column to the existing index column.
- inplace: If True, the changes are made in the DataFrame itself.
- verify_integrity: If True, checks the new index column for duplicates.
What is Pandas DataFrame.set_index() ?
The Pandas Dataframe.set_index() method is used to assign a list, series, or another data frame as the index of a given data frame. It is particularly useful when combining multiple data frames, allowing for easy modification of the index. While an index column can be specified during data frame creation, set_index() provides a flexible way to change the index later on. In essence, it allows you to set a List, Series, or Data frame as the index of a Data Frame. But sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method.
To download the CSV file used, Click Here.
Pandas DataFrame.set_index() Examples
Below are proper illustrations of the examples for Pandas DataFrame.set_index().
- Pandas Set Index to Column
- Multiple index Column
- Setting a single Float column as Index
- Setting three columns as MultiIndex
- Pandas Set Index of Specific Column
Pandas Set Index to Column
In this example, we are Changing Index column, First Name column has been made the index column of Data Frame.
Python3
# importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv( "employees.csv" ) # setting first name as index column data.set_index( "First Name" , inplace = True ) # display data.head() |
Output: Before operation –
After operation –
As shown in the output images, earlier the index column was a series of number but later it has been replaced with First name.
Pandas Set Index to Multiple index Column
In this example, two columns will be made as index column. Drop parameter is used to Drop the column and append parameter is used to append passed columns to the already existing index column.
Python3
# importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv( "employees.csv" ) # setting first name as index column data.set_index([ "First Name" , "Gender" ], inplace = True , append = True , drop = False ) # display data.head() |
Output:
As shown in the output Image, the data is having 3 index columns.
Pandas Dataframe Index Setting a single Float column as Index
In this example the below code snippet uses the Pandas library to create a DataFrame named ‘df’ from a nested list of student data. It sets the ‘Agg_Marks’ column as the index and displays the resulting DataFrame with columns ‘Name’, ‘Age’, ‘City’, and ‘Country’.
Python3
# importing pandas library import pandas as pd # creating and initializing a nested list students = [[ 'jack' , 34 , 'Sydeny' , 'Australia' , 85.96 ], [ 'Riti' , 30 , 'Delhi' , 'India' , 95.20 ], [ 'Vansh' , 31 , 'Delhi' , 'India' , 85.25 ], [ 'Nanyu' , 32 , 'Tokyo' , 'Japan' , 74.21 ], [ 'Maychan' , 16 , 'New York' , 'US' , 99.63 ], [ 'Mike' , 17 , 'las vegas' , 'US' , 47.28 ]] # Create a DataFrame object df = pd.DataFrame(students, columns = [ 'Name' , 'Age' , 'City' , 'Country' , 'Agg_Marks' ], index = [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' ]) # here we set Float column 'Agg_Marks' as index of data frame # using dataframe.set_index() function df = df.set_index( 'Agg_Marks' ) # Displaying the Data frame df |
Output :
Name Age City Country
Agg_Marks
85.96 jack 34 Sydeny Australia
95.20 Riti 30 Delhi India
85.25 Vansh 31 Delhi India
74.21 Nanyu 32 Tokyo Japan
99.63 Maychan 16 New York US
47.28 Mike 17 las vegas US
In the above example, we set the column ‘Agg_Marks‘ as an index of the data frame.
Pandas Dataframe Index Setting three columns as MultiIndex
In this example the below code utilizes Pandas to create a DataFrame ‘df’ from student data, with columns ‘Name’, ‘Age’, ‘City’, ‘Country’, ‘Agg_Marks’, and ‘ID’. It employs the `set_index()` function to establish a multi-level index using ‘Name’, ‘City’, and ‘ID’, and then displays the resulting DataFrame.
Python3
# importing pandas library import pandas as pd # creating and initializing a nested list students = [[ 'jack' , 34 , 'Sydeny' , 'Australia' , 85.96 , 400 ], [ 'Riti' , 30 , 'Delhi' , 'India' , 95.20 , 750 ], [ 'Vansh' , 31 , 'Delhi' , 'India' , 85.25 , 101 ], [ 'Nanyu' , 32 , 'Tokyo' , 'Japan' , 74.21 , 900 ], [ 'Maychan' , 16 , 'New York' , 'US' , 99.63 , 420 ], [ 'Mike' , 17 , 'las vegas' , 'US' , 47.28 , 555 ]] # Create a DataFrame object df = pd.DataFrame(students, columns = [ 'Name' , 'Age' , 'City' , 'Country' , 'Agg_Marks' , 'ID' ], index = [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' ]) # Here we pass list of 3 columns i.e 'Name', 'City' and 'ID' # to dataframe.set_index() function # to set them as multiIndex of dataframe df = df.set_index([ 'Name' , 'City' , 'ID' ]) # Displaying the Data frame df |
Output :
Age Country Agg_Marks
Name City ID
jack Sydeny 400 34 Australia 85.96
Riti Delhi 750 30 India 95.20
Vansh Delhi 101 31 India 85.25
Nanyu Tokyo 900 32 Japan 74.21
Maychan New York 420 16 US 99.63
Mike las vegas 555 17 US 47.28
In the above example, we set the columns ‘Name‘, ‘City‘, and ‘ID‘ as multiIndex of the data frame.
Pandas Set Index of Specific Column
In this example the below code demonstrates how to use Pandas to create a DataFrame, set a specific column (‘Name’ in this case) as the index using the set_index()
method, and then displays both the original and modified DataFrames. The inplace=True
parameter ensures that the changes are applied directly to the DataFrame without the need for reassignment.
Python3
import pandas as pd # Creating a sample DataFrame data = { 'Name' : [ 'Geek1' , 'Geek2' , 'Geek3' ], 'Age' : [ 25 , 30 , 35 ], 'City' : [ 'New York' , 'San Francisco' , 'Los Angeles' ]} df = pd.DataFrame(data) # Displaying the original DataFrame print ( "Original DataFrame:" ) print (df) # Using set_index() to set 'Name' column as the index df.set_index( 'Name' , inplace = True ) # Displaying the DataFrame after setting the index print ( "\nDataFrame after set_index:" ) print (df) |
Output :
Original DataFrame:
Name Age City
0 Geek1 25 New York
1 Geek2 30 San Francisco
2 Geek3 35 Los Angeles
DataFrame after set_index:
Age City
Name
Geek1 25 New York
Geek2 30 San Francisco
Geek3 35 Los Angeles
Contact Us