Why Vaex?

Vaex helps us work with large datasets efficiently and swiftly by lazy computations, virtual columns, memory-mapping, zero memory copy policy, efficient data cleansing, etc. Vaex has efficient algorithms and it emphasizes aggregate data properties instead of looking at individual samples. It is able to overcome several shortcomings of other libraries (like:- pandas). So, Let’s Explore Vaex:-

Reading Performance

For large tabular data, the reading performance of Vaex is much faster than pandas. Let’s analyze by importing same size dataset with both libraries. Link to the dataset

Reading Performance of Pandas:

Python3

import pandas as pd 
%time df_pandas = pd.read_csv("dataset1.csv")

Output:

Wall time: 1min 8s

Reading Performance of Vaex: (We read dataset in Vaex using vaex.open)

Python3

import vaex 
%time df_vaex = vaex.open("dataset1.csv.hdf5")

Output:

Wall time: 1.34 s

Vaex took very little time to read the same size dataset as compared to pandas:

Python3

print("Size =") 
print(df_pandas.shape) 
print(df_vaex.shape) 

Output:

Size = 
12852000, 36
12852000, 36

Introduction to Vaex in Python

Working on Big Data has become very common today, So we require some libraries which can facilitate us to work on big data from our systems (i.e., desktops, laptops) with instantaneous execution of Code and low memory usage.

Vaex is a Python library which helps us achieve that and makes working with large datasets super easy. It is especially for lazy Out-of-Core DataFrames (similar to Pandas). It can visualize, explore, perform computations on big tabular datasets swiftly and with minimal memory usage.

Tags:

#Blogathon-2021 #python-modules #Blogathon #Python #python

Installation:

Vaex does computations lazily

Why Vaex?

Reading Performance

Python3

Python3

Python3

Introduction to Vaex in Python

Similar Reads

Contact Us