Optimize Memory Usage
We can optimize memory usage by using appropriate data types for columns. This will significantly reduce the amount of memory consumed by a dataframe. Let’s discuss this with an example.
Consider we have a dataframe with a column named ‘column’ with floating-point numbers. By default, Pandas uses float64 data type to represent these numbers. However, for our data the precision of float32 is sufficient. In such cases, we can reduce the memory footprint by converting the column to float32.
Python3
import pandas as pd # Sample DataFrame data = { 'column' : [ 1.0 , 2.5 , 3.8 , 4.2 , 5.6 ]} df = pd.DataFrame(data) # Original DataFrame print ( "Original DataFrame:" ) print (df) print (df.info()) # Check memory usage print ( "\nMemory usage before optimization:" ) print (df.memory_usage()) |
Output:
Original DataFrame:
column
0 1.0
1 2.5
2 3.8
3 4.2
4 5.6
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 column 5 non-null float64
dtypes: float64(1)
memory usage: 168.0 bytes
None
Memory usage before optimization:
Index 128
column 40
dtype: int64
The above output shows that the ‘column’ is of type float64, and the memory usage is 7.9 KB. Now, let’s optimize the memory usage by converting the column to float32:
Python3
# Optimize memory usage df[ 'column' ] = df[ 'column' ].astype( 'float32' ) # Updated DataFrame print ( "\nDataFrame after optimizing memory usage:" ) print (df) print (df.info()) # Check memory usage after optimization print ( "\nMemory usage after optimization:" ) print (df.memory_usage()) |
Output:
DataFrame after optimizing memory usage:
column
0 1.0
1 2.5
2 3.8
3 4.2
4 5.6
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 column 5 non-null float32
dtypes: float32(1)
memory usage: 148.0 bytes
None
Memory usage after optimization:
Index 128
column 20
dtype: int64
After converting the column to float32, the memory usage was reduced to 148 B. This demonstrates a significant reduction in memory consumption while still maintaining the required level of precision.
10 Python Pandas tips to make data analysis faster
Data analysis using Python’s Pandas library is a powerful process, and its efficiency can be enhanced with specific tricks and techniques. These Python tips will make our code concise, readable, and efficient. The adaptability of Pandas makes it an efficient tool for working with structured data. Whether you are a beginner or an experienced data scientist, mastering these Python tips can help you enhance your efficiency in data analysis tasks.
In this article we will explore about What are the various 10 python panads tips to make data analysis faster and that helps us to make our work more easier.
Table of Content
- Use Vectorized Operation
- Optimize Memory Usage
- Method Chaining
- Use GroupBy Aggregations
- Using describe() and Percentile
- Leverage the Power of pd.cut and pd.qcut
- Optimize DataFrame Merging
- Use isin for Filtering
- Profile Code with ydata_profiling
- Conclusion
Contact Us