Getting Started with Polars : Implementation
Installing Polars
Before diving into examples, you need to install Polars. You can do this using pip:
pip install polars
Creating a DataFrame
Creating a DataFrame in Polars is straightforward. You can create a DataFrame from a dictionary, list of lists, or even from a CSV file.
import polars as pl
# Create a sample dataset
data = [["John", 25, "Male"], ["Alice", 30, "Female"], ["Bob", 28, "Male"]]
df = pl.DataFrame(data, schema=["Name", "Age", "Gender"])
# Basic data exploration
print(df)
Output:
shape: (3, 3)
βββββββββ¬ββββββ¬βββββββββ
β Name β Age β Gender β
β --- β --- β --- β
β str β i64 β str β
βββββββββͺββββββͺβββββββββ‘
β John β 25 β Male β
β Alice β 30 β Female β
β Bob β 28 β Male β
βββββββββ΄ββββββ΄βββββββββ
Basic DataFrame Operations
Polars provides a rich set of functions for data manipulation. Here are some common operations:
1. Filtering and Aggregation
To filter rows based on a condition, use the filter
method:
# Filtering and aggregation
male_ages = df.filter(pl.col("Gender") == "Male").select("Age")
average_male_age = male_ages.mean()
print(male_ages)
print(average_male_age)
Ouput:
shape: (2, 1)
βββββββ
β Age β
β --- β
β i64 β
βββββββ‘
β 25 β
β 28 β
βββββββ
shape: (1, 1)
ββββββββ
β Age β
β --- β
β f64 β
ββββββββ‘
β 26.5 β
ββββββββ
2. Concatenating DataFrames
# Concatenating DataFrames
more_data = [["Charlie", 22, "Male"], ["Diana", 26, "Female"]]
another_df = pl.DataFrame(more_data, schema=["Name", "Age", "Gender"])
combined_df = pl.concat([df, another_df], how="diagonal")
print(combined_df)
Ouput:
shape: (5, 3)
βββββββββββ¬ββββββ¬βββββββββ
β Name β Age β Gender β
β --- β --- β --- β
β str β i64 β str β
βββββββββββͺββββββͺβββββββββ‘
β John β 25 β Male β
β Alice β 30 β Female β
β Bob β 28 β Male β
β Charlie β 22 β Male β
β Diana β 26 β Female β
βββββββββββ΄ββββββ΄βββββββββ
3. Grouping and Aggregation
To group by a column and perform aggregation, use the groupby
and agg
methods:
# Grouping and aggregation
grouped_df = combined_df.groupby("Gender").agg(
pl.col("Age").mean().alias("Average Age")
)
print(grouped_df)
Output:
shape: (2, 2)
ββββββββββ¬ββββββββββββββ
β Gender β Average Age β
β --- β --- β
β str β f64 β
ββββββββββͺββββββββββββββ‘
β Male β 25.0 β
β Female β 28.0 β
ββββββββββ΄ββββββββββββββ
<ipython-input-5-5bc52ea0a171>:2: DeprecationWarning: `groupby` is deprecated. It has been renamed to `group_by`.
grouped_df = combined_df.groupby("Gender").agg(
4. Selecting Columns
To select specific columns, you can use the select
method:
# Select the "Name" and "Age" columns
df_selected = df.select(["Name", "Age"])
print(df_selected)
Output:
shape: (3, 2)
βββββββββ¬ββββββ
β Name β Age β
β --- β --- β
β str β i64 β
βββββββββΌββββββ€
β John β 25 β
β Alice β 30 β
β Bob β 28 β
βββββββββ΄ββββββ
5. Adding New Columns
To add a new column, use the with_column
method:
# Add a new column "Age_in_5_years" which is Age + 5
df = df.with_column((pl.col("Age") + 5).alias("Age_in_5_years"))
print(df)
Output:
shape: (3, 4)
βββββββββ¬ββββββ¬βββββββββ¬βββββββββββββββ
β Name β Age β Gender β Age_in_5_years β
β --- β --- β --- β --- β
β str β i64 β str β i64 β
βββββββββΌββββββΌβββββββββΌβββββββββββββββ€
β John β 25 β Male β 30 β
β Alice β 30 β Female β 35 β
β Bob β 28 β Male β 33 β
βββββββββ΄ββββββ΄βββββββββ΄βββββββββββββββ
6. Sorting Data
To sort the DataFrame by a specific column, use the sort
method:
# Sort by "Age" in descending order
df_sorted = df.sort("Age", reverse=True)
print(df_sorted)
Output:
shape: (3, 4)
βββββββββ¬ββββββ¬βββββββββ¬βββββββββββββββ
β Name β Age β Gender β Age_in_5_years β
β --- β --- β --- β --- β
β str β i64 β str β i64 β
βββββββββΌββββββΌβββββββββΌβββββββββββββββ€
β Alice β 30 β Female β 35 β
β Bob β 28 β Male β 33 β
β John β 25 β Male β 30 β
βββββββββ΄ββββββ΄βββββββββ΄βββββββββββββββ
Mastering Polars: High-Efficiency Data Analysis and Manipulation
In the ever-evolving landscape of data science and analytics, efficient data manipulation and analysis are paramount. While pandas has been the go-to library for many Python enthusiasts, a new player, Polars, is making waves with its performance and efficiency. This article delves into the world of Polars, providing a comprehensive introduction, highlighting its features, and showcasing practical examples to get you started.
Table of Content
- Understanding Polars Library
- Why is Polars Used for Data Science?
- Getting Started with Polars : Implementation
- Advanced Features: Parallel Processing and Lazy Evaluation
- Integration with Other Libraries
- Advantages and Disadvantages of Polars
Contact Us