How to use functools In Python

Method 2: Using unionByName()

Functools module provides functions for working with other functions and callable objects to use or extend them without completely rewriting them.

Syntax:

functools.reduce(lambda df1, df2: df1.union(df2.select(df1.columns)), dfs)

where,

df1 is the first dataframe

df2 is the second dataframe

We create dataframes with columns ‘a’ and ‘b’ of some random values and pass these three dataframes to our above-created method unionAll() and obtain the resultant dataframe as output and show the result.

Example:

Python3

import functools 
# explicit function 
  
  
def unionAll(dfs): 
    return functools.reduce(lambda df1, df2: df1.union( 
      df2.select(df1.columns)), dfs) 
  
  
# unionAll 
result3 = unionAll([df1, df2]) 
result3.show() 

Output:

+------+----------+------+------+
|  Name|       DOB|Gender|salary|
+------+----------+------+------+
|   Ram|1991-04-01|     M|  3000|
|  Mike|2000-05-19|     M|  4000|
|Rohini|1978-09-05|     M|  4000|
| Maria|1967-12-01|     F|  4000|
| Jenis|1980-02-17|     F|  1200|
|   Ram|1991-04-01|     M|  3000|
|  Mike|2000-05-19|     M|  4000|
|Rohini|1978-09-05|     M|  4000|
| Maria|1967-12-01|     F|  4000|
| Jenis|1980-02-17|     F|  1200|
+------+----------+------+------+

Concatenate two PySpark dataframes

In this article, we are going to see how to concatenate two pyspark dataframe using Python.

Creating Dataframe for demonstration:

Python3

# Importing necessary libraries 
from pyspark.sql import SparkSession 
  
# Create a spark session 
spark = SparkSession.builder.appName('pyspark - example join').getOrCreate() 
  
# Create data in dataframe 
data = [(('Ram'), '1991-04-01', 'M', 3000), 
        (('Mike'), '2000-05-19', 'M', 4000), 
        (('Rohini'), '1978-09-05', 'M', 4000), 
        (('Maria'), '1967-12-01', 'F', 4000), 
        (('Jenis'), '1980-02-17', 'F', 1200)] 
  
# Column names in dataframe 
columns = ["Name", "DOB", "Gender", "salary"] 
  
# Create the spark dataframe 
df1 = spark.createDataFrame(data=data, schema=columns) 
  
# Print the dataframe 
df1.show() 

Output:

+------+----------+------+------+
|  Name|       DOB|Gender|salary|
+------+----------+------+------+
|   Ram|1991-04-01|     M|  3000|
|  Mike|2000-05-19|     M|  4000|
|Rohini|1978-09-05|     M|  4000|
| Maria|1967-12-01|     F|  4000|
| Jenis|1980-02-17|     F|  1200|
+------+----------+------+------+

Creating Second dataframe for demonstration:

Python3

# Create data in dataframe 
data2 = [(('Mohi'), '1991-04-01', 'M', 3000), 
         (('Ani'), '2000-05-19', 'F', 4300), 
         (('Shipta'), '1978-09-05', 'F', 4200), 
         (('Jessy'), '1967-12-01', 'F', 4010), 
         (('kanne'), '1980-02-17', 'F', 1200)] 
  
# Column names in dataframe 
columns = ["Name", "DOB", "Gender", "salary"] 
  
# Create the spark dataframe 
df2 = spark.createDataFrame(data=data, schema=columns) 
  
# Print the dataframe 
df2.show() 

Output:

+------+----------+------+------+
|  Name|       DOB|Gender|salary|
+------+----------+------+------+
|   Ram|1991-04-01|     M|  3000|
|  Mike|2000-05-19|     M|  4000|
|Rohini|1978-09-05|     M|  4000|
| Maria|1967-12-01|     F|  4000|
| Jenis|1980-02-17|     F|  1200|
+------+----------+------+------+

Tags:

#Python-Pyspark #Python #python

Method 2: Using unionByName()

How to use functools In Python

Python3

Concatenate two PySpark dataframes

Python3

Python3

Similar Reads

Contact Us