Vaexfollows zero memory copy policy
Unlike Pandas, No copies of memory are created in Vaex during data filtering, selections, subsets, cleansing. Let’s take the case of data filtering, in achieving this task Vaex uses very little memory as no memory copying is done in Vaex. and the time for execution is also minimal.
Pandas:
Python3
% time df_pandas_filtered = df_pandas[df_pandas[ 'column5' ] > 1 ] |
Output:
Wall time: 24.1 s
Vaex:
Python3
% time df_vaex_filtered = df_vaex[df_vaex[ 'column5' ] > 1 ] |
Output:
Wall time: 91.4 ms
Here data filtering results in a reference to the existing data with a boolean mask which keeps track of selected rows and non-selected rows. Vaex performs multiple computations in single pass over the data:-
Python3
df_vaex.select(df_vaex.column4 < 20 , name = 'less_than' ) df_vaex.select(df_vaex.column4 > = 20 , name = 'gr_than' ) % time df_vaex.mean(df_vaex.column4, selection = [ 'less_than' , 'gr_than' ]) |
Output:
Wall time: 128 ms array([ 9.4940431, 59.49137605])
Introduction to Vaex in Python
Working on Big Data has become very common today, So we require some libraries which can facilitate us to work on big data from our systems (i.e., desktops, laptops) with instantaneous execution of Code and low memory usage.
Vaex is a Python library which helps us achieve that and makes working with large datasets super easy. It is especially for lazy Out-of-Core DataFrames (similar to Pandas). It can visualize, explore, perform computations on big tabular datasets swiftly and with minimal memory usage.
Contact Us