Convert Python Dictionary List to PySpark DataFrame

In this article, we will discuss how to convert Python Dictionary List to Pyspark DataFrame.

It can be done in these ways:

  • Using Infer schema.
  • Using Explicit schema
  • Using SQL Expression

Method 1: Infer schema from the dictionary

We will pass the dictionary directly to the createDataFrame() method.

Syntax: spark.createDataFrame(data)

Example: Python code to create pyspark dataframe from dictionary list using this method

Python3




# import the modules
from pyspark.sql import SparkSession
  
# Create Spark session app name
# is GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# Create data frame from dictionary list
df = spark.createDataFrame(data)
  
# display
df.show()


Output:

Method 2: Using Explicit schema

Here we are going to create a schema and pass the schema along with the data to createdataframe() method.

Schema structure:

schema = StructType([

   StructField(‘column_1’, DataType(), False),

   StructField(‘column_2’, DataType(), False)])

Where columns are the name of the columns of the dictionary to get in pyspark dataframe and Datatype is the data type of the particular column.

Syntax: spark.createDataFrame(data, schema)

Where, 

  • data is the dictionary list
  • schema is the schema of the dataframe

Python program to create pyspark dataframe from dictionary lists using this method.

Python3




# import the modules
from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType,
StringType, IntegerType, FloatType
  
  
# Create Spark session app name is
# GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# specify the schema
schema = StructType([
    StructField('Name', StringType(), False),
    StructField('ID', IntegerType(), False),
    StructField('Percentage', FloatType(), True)
])
  
# Create data frame from
# dictionary list through the schema
df = spark.createDataFrame(data, schema)
  
# display
df.show()


Output:

Method 3: Using SQL Expression

Here we are using the Row function to convert the python dictionary list to pyspark dataframe.

Syntax: spark.createDataFrame([Row(**iterator) for iterator in data])

where: 

  • createDataFrame() is the method to create the dataframe
  • Row(**iterator) to iterate the dictionary list.
  • data is the dictionary list

Python code to convert dictionary list to pyspark dataframe.

Python3




# import the modules
from pyspark.sql import SparkSession, Row
  
# Create Spark session app name
# is GFG and master name is local
spark = SparkSession.builder.appName("GFG").master("local") .getOrCreate()
  
# dictionary list of college data
data = [{"Name": 'sravan kumar',
         "ID": 1,
         "Percentage": 94.29},
        {"Name": 'sravani',
         "ID": 2,
         "Percentage": 84.29},
        {"Name": 'kumar',
         "ID": 3,
         "Percentage": 94.29}
        ]
  
# create dataframe using sql expression
dataframe = spark.createDataFrame([Row(**variable) 
                                   for variable in data])
  
dataframe.show()


Output:



Contact Us