GroupBy in Python Pandas

A groupby operation in Pandas helps us to split the object by applying a function and there-after combine the results.

After grouping the columns according to our choice, we can perform various operations which can eventually help us in the analysis of the data. 

Syntax

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)

Parameter

  • by: It helps us to group by specific or multiple columns in the DataFrame.
  • axis: It has a default value of 0 where 0 stands for index and 1 stands for columns.
  • level: Let us consider that the DataFrame we are working with has hierarchical indexing. In that case, level helps us to determine the level of the index we are working with.
  • as_index: It is a boolean data type with a default value of true. It returns an object with group labels as the index.
  • sort: It helps us to sort the key values. It is preferable to keep it as false for better performance.
  • group_keys: It is also a boolean value with a default value of true. It adds group keys to indexes to identify pieces
  • dropna: It helps to drop the ‘NA‘ values in a dataset 

Example 1: Using Groupby with DataFrame

First, let’s create a DataFrame on which we will perform the groupby operation.

Python3
# importing pandas library
import numpy as np

# Creating pandas dataframe
df = pd.DataFrame(
    [
        ("Corona Positive", 65, 99),
        ("Corona Negative", 52, 98.7),
        ("Corona Positive", 43, 100.1),
        ("Corona Positive", 26, 99.6),
        ("Corona Negative", 30, 98.1),
    ],
    
    index=["Patient 1", "Patient 2", "Patient 3",
           "Patient 4", "Patient 5"],
    
    columns=("Status", "Age(in Years)", "Temperature"),
)

# show dataframe
print(df)

Output:

Now let us group them according to some features:

Python3
# Grouping with only status 
grouped1 = df.groupby("Status")

# Grouping with temperature and status
grouped3 = df.groupby(["Temperature", "Status"])

As we can see, we have grouped them according to ‘Status‘ and ‘Temperature and Status‘. Let us perform some functions now:

Example: Finding the mean of a Group

This will create the mean of the numerical values according to the ‘status’. 

Python3
# Finding the mean of the
# patients reports according to
# the status
grouped1.mean()

Pandas – Multi-index and Groupby Tutorial

Multi-index and Groupby are very important concepts of data manipulation. Multi-index allows you to represent data with multi-levels of indexing, creating a hierarchy in rows and columns.

Groupby lets you create groups of similar data and apply aggregate functions (e.g., mean, sum, count, standard deviation) to each group, condensing large datasets into meaningful summaries.

Using both these tools together allows you to analyze data from a different aspect.

In this article, we will discuss Multi-index for Pandas Dataframe and Groupby operations.

Similar Reads

Multi-index in Python Pandas

Multi-index allows you to select more than one row and column in your index....

GroupBy in Python Pandas

A groupby operation in Pandas helps us to split the object by applying a function and there-after combine the results....

Conclusion

We have covered the concept of Multi index and groupby in Pandas Python in this tutorial. Both these concepts are very crucial in data manipulation while doing data analysis....

Contact Us