How to use describe() and Percentile In Python

Leverage the Power of pd.cut and pd.qcut

In data analysis, the describe and percentile functions help to understand the distribution and summary statistics of a dataset. Describe function gives us the statistical properties like count, mean, standard deviation, min, max, etc for the numerical columns. Percentile divides the dataset into specific percentage intervals like 20%, 40%, 60%, and 80%. In some scenarios, our analysis needs these percentile values to be included in the summary given by the describe function. In such cases, we can use percentiles within the describe function.

Example: In the below code, we are getting the summary using describe function and then calculating the required percentiles.

Python3

import pandas as pd 
# Sample DataFrame 
data = {'Value': [10, 15, 20, 25, 30, 35, 40, 45, 50]} 
df = pd.DataFrame(data) 
# Using describe to get summary statistics 
summary_stats = df['Value'].describe() 
print(summary_stats) 
# Calculate specific percentiles 
percentile_25 = df['Value'].quantile(0.25) 
percentile_50 = df['Value'].quantile(0.50)  # Median 
percentile_75 = df['Value'].quantile(0.75) 
print(f'25th Percentile: {percentile_25}') 
print(f'50th Percentile (Median): {percentile_50}') 
print(f'75th Percentile: {percentile_75}') 

Output:

count     9.000000
mean     30.000000
std      13.693064
min      10.000000
25%      20.000000
50%      30.000000
75%      40.000000
max      50.000000
Name: Value, dtype: float64
25th Percentile: 20.0
50th Percentile (Median): 30.0
75th Percentile: 40.0

Combining describe and percentile: We can efficiently summarize the code by including percentiles in the describe function. It gives the percentile values along with the describe function’s summary. Let’s see the code.

Python3

import pandas as pd 
# Sample DataFrame 
data = {'Value': [10, 15, 20, 25, 30, 35, 40, 45, 50]} 
df = pd.DataFrame(data) 
# Using describe to get summary statistics including specific percentiles 
summary_stats = df['Value'].describe(percentiles=[0.25, 0.5, 0.75]) 
print(summary_stats)

Output:

count     9.000000
mean     30.000000
std      13.693064
min      10.000000
25%      20.000000
50%      30.000000
75%      40.000000
max      50.000000
Name: Value, dtype: float64

10 Python Pandas tips to make data analysis faster

Data analysis using Python’s Pandas library is a powerful process, and its efficiency can be enhanced with specific tricks and techniques. These Python tips will make our code concise, readable, and efficient. The adaptability of Pandas makes it an efficient tool for working with structured data. Whether you are a beginner or an experienced data scientist, mastering these Python tips can help you enhance your efficiency in data analysis tasks.

Pandas tips for Data Analysis

In this article we will explore about What are the various 10 python panads tips to make data analysis faster and that helps us to make our work more easier.

Table of Content

Use Vectorized Operation
Optimize Memory Usage
Method Chaining
Use GroupBy Aggregations
Using describe() and Percentile
Leverage the Power of pd.cut and pd.qcut
Optimize DataFrame Merging
Use isin for Filtering
Profile Code with ydata_profiling
Conclusion

How to use describe() and Percentile In Python

Python3

Python3

10 Python Pandas tips to make data analysis faster

Similar Reads

Contact Us