Flattening hierarchical index in pandas dataframe using groupby

Whenever we use the groupby function on a single column with multiple aggregation functions we get multiple hierarchical indexes based on the aggregation type. In such cases, the hierarchical index has to be flattened at both levels. 

Syntax: pandas.DataFrame.groupby(by=None, axis=0, level=None)

Parameter:

  • by – mapping function that determines the groups in groupby function
  • axis – 0 – splits along rows and 1 – splits along columns.
  • level – if the axis is multi-indexed, groups at a specified level. (int)

Syntax: pandas.DataFrame.agg(func=None, axis=0)

Parameter:

  • func – specifies the function to be used as aggregation function. (min, max, sum etc)
  • axis – 0 – function applied to each column and 1- applied to each row.

Example

Import the python pandas package. Create a sample dataframe showing the car sales in two-quarters q1 and q2 as shown. Now use the pandas groupby function to group based on the sum and max of sales on quarter 1 and sum and min of sales 2. The grouped dataframe has multi-indexed columns stored in a list of tuples. Use a for loop to iterate through the list of tuples and join them as a single string. Append the joined strings in the flat_cols list. </li > <li > Now assign the flat_cols list to the column names of the multi-indexed grouped dataframe columns.

Python3




# group by cars based on
# the sum and max of sales on quarter 1
# and sum
grouped_data = data.groupby(by="cars").agg(
                 {"sale_q1 in Cr": [sum, max],
                'sale_q2 in Cr': [sum, min]})
 
# create an empty list to save the
# names of the flattened columns
flat_cols = []
 
# iterate through this tuples and
# join them as single string
for i in grouped_data.columns:
    flat_cols.append(i[0]+'_'+i[1])
 
# now assign the list of flattened
# columns to the grouped columns.
grouped_data.columns = flat_cols
 
# print the grouped data
print(grouped_data)


Output: 

How to flatten a hierarchical index in Pandas

How to flatten a hierarchical index in Pandas DataFrame columns?

In this article, we are going to see the flatten a hierarchical index in Pandas DataFrame columns. Hierarchical Index usually occurs as a result of groupby() aggregation functions. Flatten hierarchical index in Pandas, the aggregated function used will appear in the hierarchical index of the resulting dataframe.

Similar Reads

Using reset_index() function

Pandas provide a function called reset_index() to flatten the hierarchical index created due to the groupby aggregation function in Python....

Using as_index() function

...

Flattening hierarchical index in pandas dataframe using groupby

Pandas provide a function called as_index() which is specified by a boolean value. The as_index() functions groups the dataframe by the specified aggregate function and if  as_index() value is False, the resulting dataframe is flattened....

Flattening hierarchical index using to_records() function

...

Flattening hierarchical columns using join() and rstrip()

Whenever we use the groupby function on a single column with multiple aggregation functions we get multiple hierarchical indexes based on the aggregation type. In such cases, the hierarchical index has to be flattened at both levels....

Contact Us