Optimize Memory Usage

We can optimize memory usage by using appropriate data types for columns. This will significantly reduce the amount of memory consumed by a dataframe. Let’s discuss this with an example.

Consider we have a dataframe with a column named ‘column’ with floating-point numbers. By default, Pandas uses float64 data type to represent these numbers. However, for our data the precision of float32 is sufficient. In such cases, we can reduce the memory footprint by converting the column to float32.

Python3

import pandas as pd 
# Sample DataFrame 
data = {'column': [1.0, 2.5, 3.8, 4.2, 5.6]} 
df = pd.DataFrame(data) 
# Original DataFrame 
print("Original DataFrame:") 
print(df) 
print(df.info()) 
# Check memory usage 
print("\nMemory usage before optimization:") 
print(df.memory_usage())

Output:

Original DataFrame:
   column
0     1.0
1     2.5
2     3.8
3     4.2
4     5.6
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   column  5 non-null      float64
dtypes: float64(1)
memory usage: 168.0 bytes
None
Memory usage before optimization:
Index     128
column     40
dtype: int64

The above output shows that the ‘column’ is of type float64, and the memory usage is 7.9 KB. Now, let’s optimize the memory usage by converting the column to float32:

Python3

# Optimize memory usage 
df['column'] = df['column'].astype('float32') 
# Updated DataFrame 
print("\nDataFrame after optimizing memory usage:") 
print(df) 
print(df.info()) 
# Check memory usage after optimization 
print("\nMemory usage after optimization:") 
print(df.memory_usage()) 

Output:

DataFrame after optimizing memory usage:
   column
0     1.0
1     2.5
2     3.8
3     4.2
4     5.6
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   column  5 non-null      float32
dtypes: float32(1)
memory usage: 148.0 bytes
None
Memory usage after optimization:
Index     128
column     20
dtype: int64

After converting the column to float32, the memory usage was reduced to 148 B. This demonstrates a significant reduction in memory consumption while still maintaining the required level of precision.

10 Python Pandas tips to make data analysis faster

Data analysis using Python’s Pandas library is a powerful process, and its efficiency can be enhanced with specific tricks and techniques. These Python tips will make our code concise, readable, and efficient. The adaptability of Pandas makes it an efficient tool for working with structured data. Whether you are a beginner or an experienced data scientist, mastering these Python tips can help you enhance your efficiency in data analysis tasks.

Pandas tips for Data Analysis

In this article we will explore about What are the various 10 python panads tips to make data analysis faster and that helps us to make our work more easier.

Table of Content

Use Vectorized Operation
Optimize Memory Usage
Method Chaining
Use GroupBy Aggregations
Using describe() and Percentile
Leverage the Power of pd.cut and pd.qcut
Optimize DataFrame Merging
Use isin for Filtering
Profile Code with ydata_profiling
Conclusion

Optimize Memory Usage

Python3

Python3

10 Python Pandas tips to make data analysis faster

Similar Reads

Contact Us