Getting Started with Polars : Implementation

Advanced Features: Parallel Processing and Lazy Evaluation

Installing Polars

Before diving into examples, you need to install Polars. You can do this using pip:

pip install polars

Creating a DataFrame

Creating a DataFrame in Polars is straightforward. You can create a DataFrame from a dictionary, list of lists, or even from a CSV file.

Python

import polars as pl

# Create a sample dataset
data = [["John", 25, "Male"], ["Alice", 30, "Female"], ["Bob", 28, "Male"]]
df = pl.DataFrame(data, schema=["Name", "Age", "Gender"])

# Basic data exploration
print(df)

Output:

shape: (3, 3)
┌───────┬─────┬────────┐
│ Name  ┆ Age ┆ Gender │
│ ---   ┆ --- ┆ ---    │
│ str   ┆ i64 ┆ str    │
╞═══════╪═════╪════════╡
│ John  ┆ 25  ┆ Male   │
│ Alice ┆ 30  ┆ Female │
│ Bob   ┆ 28  ┆ Male   │
└───────┴─────┴────────┘

Basic DataFrame Operations

Polars provides a rich set of functions for data manipulation. Here are some common operations:

1. Filtering and Aggregation

To filter rows based on a condition, use the filter method:

Python

# Filtering and aggregation
male_ages = df.filter(pl.col("Gender") == "Male").select("Age")
average_male_age = male_ages.mean()

print(male_ages)
print(average_male_age)

Ouput:

shape: (2, 1)
┌─────┐
│ Age │
│ --- │
│ i64 │
╞═════╡
│ 25  │
│ 28  │
└─────┘
shape: (1, 1)
┌──────┐
│ Age  │
│ ---  │
│ f64  │
╞══════╡
│ 26.5 │
└──────┘

2. Concatenating DataFrames

Python

# Concatenating DataFrames
more_data = [["Charlie", 22, "Male"], ["Diana", 26, "Female"]]
another_df = pl.DataFrame(more_data, schema=["Name", "Age", "Gender"])

combined_df = pl.concat([df, another_df], how="diagonal")
print(combined_df)

Ouput:

shape: (5, 3)
┌─────────┬─────┬────────┐
│ Name    ┆ Age ┆ Gender │
│ ---     ┆ --- ┆ ---    │
│ str     ┆ i64 ┆ str    │
╞═════════╪═════╪════════╡
│ John    ┆ 25  ┆ Male   │
│ Alice   ┆ 30  ┆ Female │
│ Bob     ┆ 28  ┆ Male   │
│ Charlie ┆ 22  ┆ Male   │
│ Diana   ┆ 26  ┆ Female │
└─────────┴─────┴────────┘

3. Grouping and Aggregation

To group by a column and perform aggregation, use the groupby and agg methods:

Python

# Grouping and aggregation
grouped_df = combined_df.groupby("Gender").agg(
    pl.col("Age").mean().alias("Average Age")
)
print(grouped_df)

Output:

shape: (2, 2)
┌────────┬─────────────┐
│ Gender ┆ Average Age │
│ ---    ┆ ---         │
│ str    ┆ f64         │
╞════════╪═════════════╡
│ Male   ┆ 25.0        │
│ Female ┆ 28.0        │
└────────┴─────────────┘
<ipython-input-5-5bc52ea0a171>:2: DeprecationWarning: `groupby` is deprecated. It has been renamed to `group_by`.
  grouped_df = combined_df.groupby("Gender").agg(

4. Selecting Columns

To select specific columns, you can use the select method:

Python

# Select the "Name" and "Age" columns
df_selected = df.select(["Name", "Age"])
print(df_selected)

Output:

shape: (3, 2)
┌───────┬─────┐
│ Name  │ Age │
│ ---   │ --- │
│ str   │ i64 │
├───────┼─────┤
│ John  │ 25  │
│ Alice │ 30  │
│ Bob   │ 28  │
└───────┴─────┘

5. Adding New Columns

To add a new column, use the with_column method:

Python

# Add a new column "Age_in_5_years" which is Age + 5
df = df.with_column((pl.col("Age") + 5).alias("Age_in_5_years"))
print(df)

Output:

shape: (3, 4)
┌───────┬─────┬────────┬──────────────┐
│ Name  │ Age │ Gender │ Age_in_5_years │
│ ---   │ --- │ ---    │ ---          │
│ str   │ i64 │ str    │ i64          │
├───────┼─────┼────────┼──────────────┤
│ John  │ 25  │ Male   │ 30           │
│ Alice │ 30  │ Female │ 35           │
│ Bob   │ 28  │ Male   │ 33           │
└───────┴─────┴────────┴──────────────┘

6. Sorting Data

To sort the DataFrame by a specific column, use the sort method:

Python

# Sort by "Age" in descending order
df_sorted = df.sort("Age", reverse=True)
print(df_sorted)

Output:

shape: (3, 4)
┌───────┬─────┬────────┬──────────────┐
│ Name  │ Age │ Gender │ Age_in_5_years │
│ ---   │ --- │ ---    │ ---          │
│ str   │ i64 │ str    │ i64          │
├───────┼─────┼────────┼──────────────┤
│ Alice │ 30  │ Female │ 35           │
│ Bob   │ 28  │ Male   │ 33           │
│ John  │ 25  │ Male   │ 30           │
└───────┴─────┴────────┴──────────────┘

Mastering Polars: High-Efficiency Data Analysis and Manipulation

In the ever-evolving landscape of data science and analytics, efficient data manipulation and analysis are paramount. While pandas has been the go-to library for many Python enthusiasts, a new player, Polars, is making waves with its performance and efficiency. This article delves into the world of Polars, providing a comprehensive introduction, highlighting its features, and showcasing practical examples to get you started.

Table of Content

Understanding Polars Library
Why is Polars Used for Data Science?
Getting Started with Polars : Implementation
Advanced Features: Parallel Processing and Lazy Evaluation
Integration with Other Libraries
Advantages and Disadvantages of Polars

Getting Started with Polars : Implementation

Installing Polars

Basic DataFrame Operations

Mastering Polars: High-Efficiency Data Analysis and Manipulation

Similar Reads

Contact Us