Getting Started with Polars : Implementation

Installing Polars

Before diving into examples, you need to install Polars. You can do this using pip:

pip install polars

Creating a DataFrame

Creating a DataFrame in Polars is straightforward. You can create a DataFrame from a dictionary, list of lists, or even from a CSV file.

Python
import polars as pl

# Create a sample dataset
data = [["John", 25, "Male"], ["Alice", 30, "Female"], ["Bob", 28, "Male"]]
df = pl.DataFrame(data, schema=["Name", "Age", "Gender"])

# Basic data exploration
print(df)

Output:

shape: (3, 3)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Name ┆ Age ┆ Gender β”‚
β”‚ --- ┆ --- ┆ --- β”‚
β”‚ str ┆ i64 ┆ str β”‚
β•žβ•β•β•β•β•β•β•β•ͺ═════β•ͺ════════║
β”‚ John ┆ 25 ┆ Male β”‚
β”‚ Alice ┆ 30 ┆ Female β”‚
β”‚ Bob ┆ 28 ┆ Male β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Basic DataFrame Operations

Polars provides a rich set of functions for data manipulation. Here are some common operations:

1. Filtering and Aggregation

To filter rows based on a condition, use the filter method:

Python
# Filtering and aggregation
male_ages = df.filter(pl.col("Gender") == "Male").select("Age")
average_male_age = male_ages.mean()

print(male_ages)
print(average_male_age)

Ouput:

shape: (2, 1)
β”Œβ”€β”€β”€β”€β”€β”
β”‚ Age β”‚
β”‚ --- β”‚
β”‚ i64 β”‚
β•žβ•β•β•β•β•β•‘
β”‚ 25 β”‚
β”‚ 28 β”‚
β””β”€β”€β”€β”€β”€β”˜
shape: (1, 1)
β”Œβ”€β”€β”€β”€β”€β”€β”
β”‚ Age β”‚
β”‚ --- β”‚
β”‚ f64 β”‚
β•žβ•β•β•β•β•β•β•‘
β”‚ 26.5 β”‚
β””β”€β”€β”€β”€β”€β”€β”˜

2. Concatenating DataFrames

Python
# Concatenating DataFrames
more_data = [["Charlie", 22, "Male"], ["Diana", 26, "Female"]]
another_df = pl.DataFrame(more_data, schema=["Name", "Age", "Gender"])

combined_df = pl.concat([df, another_df], how="diagonal")
print(combined_df)

Ouput:

shape: (5, 3)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Name ┆ Age ┆ Gender β”‚
β”‚ --- ┆ --- ┆ --- β”‚
β”‚ str ┆ i64 ┆ str β”‚
β•žβ•β•β•β•β•β•β•β•β•β•ͺ═════β•ͺ════════║
β”‚ John ┆ 25 ┆ Male β”‚
β”‚ Alice ┆ 30 ┆ Female β”‚
β”‚ Bob ┆ 28 ┆ Male β”‚
β”‚ Charlie ┆ 22 ┆ Male β”‚
β”‚ Diana ┆ 26 ┆ Female β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Grouping and Aggregation

To group by a column and perform aggregation, use the groupby and agg methods:

Python
# Grouping and aggregation
grouped_df = combined_df.groupby("Gender").agg(
    pl.col("Age").mean().alias("Average Age")
)
print(grouped_df)

Output:

shape: (2, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gender ┆ Average Age β”‚
β”‚ --- ┆ --- β”‚
β”‚ str ┆ f64 β”‚
β•žβ•β•β•β•β•β•β•β•β•ͺ═════════════║
β”‚ Male ┆ 25.0 β”‚
β”‚ Female ┆ 28.0 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
<ipython-input-5-5bc52ea0a171>:2: DeprecationWarning: `groupby` is deprecated. It has been renamed to `group_by`.
grouped_df = combined_df.groupby("Gender").agg(

4. Selecting Columns

To select specific columns, you can use the select method:

Python
# Select the "Name" and "Age" columns
df_selected = df.select(["Name", "Age"])
print(df_selected)

Output:

shape: (3, 2)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ Name β”‚ Age β”‚
β”‚ --- β”‚ --- β”‚
β”‚ str β”‚ i64 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€
β”‚ John β”‚ 25 β”‚
β”‚ Alice β”‚ 30 β”‚
β”‚ Bob β”‚ 28 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

5. Adding New Columns

To add a new column, use the with_column method:

Python
# Add a new column "Age_in_5_years" which is Age + 5
df = df.with_column((pl.col("Age") + 5).alias("Age_in_5_years"))
print(df)

Output:

shape: (3, 4)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Name β”‚ Age β”‚ Gender β”‚ Age_in_5_years β”‚
β”‚ --- β”‚ --- β”‚ --- β”‚ --- β”‚
β”‚ str β”‚ i64 β”‚ str β”‚ i64 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ John β”‚ 25 β”‚ Male β”‚ 30 β”‚
β”‚ Alice β”‚ 30 β”‚ Female β”‚ 35 β”‚
β”‚ Bob β”‚ 28 β”‚ Male β”‚ 33 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

6. Sorting Data

To sort the DataFrame by a specific column, use the sort method:

Python
# Sort by "Age" in descending order
df_sorted = df.sort("Age", reverse=True)
print(df_sorted)

Output:

shape: (3, 4)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Name β”‚ Age β”‚ Gender β”‚ Age_in_5_years β”‚
β”‚ --- β”‚ --- β”‚ --- β”‚ --- β”‚
β”‚ str β”‚ i64 β”‚ str β”‚ i64 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Alice β”‚ 30 β”‚ Female β”‚ 35 β”‚
β”‚ Bob β”‚ 28 β”‚ Male β”‚ 33 β”‚
β”‚ John β”‚ 25 β”‚ Male β”‚ 30 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Mastering Polars: High-Efficiency Data Analysis and Manipulation

In the ever-evolving landscape of data science and analytics, efficient data manipulation and analysis are paramount. While pandas has been the go-to library for many Python enthusiasts, a new player, Polars, is making waves with its performance and efficiency. This article delves into the world of Polars, providing a comprehensive introduction, highlighting its features, and showcasing practical examples to get you started.

Table of Content

  • Understanding Polars Library
  • Why is Polars Used for Data Science?
  • Getting Started with Polars : Implementation
  • Advanced Features: Parallel Processing and Lazy Evaluation
  • Integration with Other Libraries
  • Advantages and Disadvantages of Polars

Similar Reads

Understanding Polars Library

Polars is a DataFrame library designed for high-performance data manipulation and analysis. Written in Rust, Polars leverages the power of Rust’s memory safety and concurrency features to offer a fast and efficient alternative to pandas. It is particularly well-suited for handling large datasets and performing complex operations with ease. A high-performance, open-source data processing package called Polars was created especially for columnar data. It offers an extensive collection of tools for various tasks, including joining, filtering, aggregating, and manipulating data. The library provides unmatched speed and efficiency while processing big datasets since it is designed to take advantage of contemporary CPU architectures....

Why is Polars Used for Data Science?

Polars’ expressiveness, performance, and capacity to manage big datasets make it an excellent choice for data science applications. Polars are favored by data scientists for the following main reasons:...

Getting Started with Polars : Implementation

Installing Polars...

Advanced Features: Parallel Processing and Lazy Evaluation

Polars naturally provides parallel processing to expedite calculations and permits lazy evaluation, which may be useful for query plan optimization....

Integration with Other Libraries

Polars can seamlessly integrate with other popular Python libraries, such as NumPy and pandas....

Advantages and Disadvantages of Polars

Advantages of Polars...

Conclusion

Polars is a powerful and efficient DataFrame library that offers a compelling alternative to pandas. With its high performance, memory efficiency, and expressive API, Polars is well-suited for handling large datasets and complex data manipulations. Whether you are a data scientist, analyst, or developer, Polars can help you achieve your data processing goals with ease.By incorporating Polars into your data workflow, you can leverage its advanced features, such as lazy evaluation and parallel processing, to optimize your data operations and improve performance....

Contact Us