How to use functools In Python
Functools module provides functions for working with other functions and callable objects to use or extend them without completely rewriting them.
Syntax:
functools.reduce(lambda df1, df2: df1.union(df2.select(df1.columns)), dfs)
where,
- df1 is the first dataframe
- df2 is the second dataframe
We create dataframes with columns ‘a’ and ‘b’ of some random values and pass these three dataframes to our above-created method unionAll() and obtain the resultant dataframe as output and show the result.
Example:
Python3
import functools # explicit function def unionAll(dfs): return functools. reduce ( lambda df1, df2: df1.union( df2.select(df1.columns)), dfs) # unionAll result3 = unionAll([df1, df2]) result3.show() |
Output:
+------+----------+------+------+ | Name| DOB|Gender|salary| +------+----------+------+------+ | Ram|1991-04-01| M| 3000| | Mike|2000-05-19| M| 4000| |Rohini|1978-09-05| M| 4000| | Maria|1967-12-01| F| 4000| | Jenis|1980-02-17| F| 1200| | Ram|1991-04-01| M| 3000| | Mike|2000-05-19| M| 4000| |Rohini|1978-09-05| M| 4000| | Maria|1967-12-01| F| 4000| | Jenis|1980-02-17| F| 1200| +------+----------+------+------+
Concatenate two PySpark dataframes
In this article, we are going to see how to concatenate two pyspark dataframe using Python.
Creating Dataframe for demonstration:
Python3
# Importing necessary libraries from pyspark.sql import SparkSession # Create a spark session spark = SparkSession.builder.appName( 'pyspark - example join' ).getOrCreate() # Create data in dataframe data = [(( 'Ram' ), '1991-04-01' , 'M' , 3000 ), (( 'Mike' ), '2000-05-19' , 'M' , 4000 ), (( 'Rohini' ), '1978-09-05' , 'M' , 4000 ), (( 'Maria' ), '1967-12-01' , 'F' , 4000 ), (( 'Jenis' ), '1980-02-17' , 'F' , 1200 )] # Column names in dataframe columns = [ "Name" , "DOB" , "Gender" , "salary" ] # Create the spark dataframe df1 = spark.createDataFrame(data = data, schema = columns) # Print the dataframe df1.show() |
Output:
+------+----------+------+------+ | Name| DOB|Gender|salary| +------+----------+------+------+ | Ram|1991-04-01| M| 3000| | Mike|2000-05-19| M| 4000| |Rohini|1978-09-05| M| 4000| | Maria|1967-12-01| F| 4000| | Jenis|1980-02-17| F| 1200| +------+----------+------+------+
Creating Second dataframe for demonstration:
Python3
# Create data in dataframe data2 = [(( 'Mohi' ), '1991-04-01' , 'M' , 3000 ), (( 'Ani' ), '2000-05-19' , 'F' , 4300 ), (( 'Shipta' ), '1978-09-05' , 'F' , 4200 ), (( 'Jessy' ), '1967-12-01' , 'F' , 4010 ), (( 'kanne' ), '1980-02-17' , 'F' , 1200 )] # Column names in dataframe columns = [ "Name" , "DOB" , "Gender" , "salary" ] # Create the spark dataframe df2 = spark.createDataFrame(data = data, schema = columns) # Print the dataframe df2.show() |
Output:
+------+----------+------+------+ | Name| DOB|Gender|salary| +------+----------+------+------+ | Ram|1991-04-01| M| 3000| | Mike|2000-05-19| M| 4000| |Rohini|1978-09-05| M| 4000| | Maria|1967-12-01| F| 4000| | Jenis|1980-02-17| F| 1200| +------+----------+------+------+
Contact Us