How to get a value from the Row object in PySpark Dataframe?

In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame.

Method 1 : Using __getitem()__ magic method

We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the __getitem()__ magic method to get an item of a particular column name. Given below is the syntax.

Syntax : DataFrame.__getitem__(‘Column_Name’)

Returns : value corresponding to the column name in the Row object


# library import
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import Row
# Session Creation
random_value_session = SparkSession.builder.appName(
# Data filled in our DataFrame
# 5 rows below
rows = [['All England Open', 'March', 'Super 1000'],
        ['Malaysia Open', 'January', 'Super 750'],
        ['Korea Open', 'April', 'Super 500'],
        ['Hylo Open', 'November', 'Super 100'],
        ['Spain Masters', 'March', 'Super 300']]
# Columns of our DataFrame
columns = ['Tournament', 'Month', 'Level']
#DataFrame is created
dataframe = random_value_session.createDataFrame(rows,
# Showing the DataFrame
# getting list of rows using collect()
row_list = dataframe.collect()
# Printing the first Row object
# from which data is extracted
# Using __getitem__() magic method
# To get value corresponding to a particular
# column


|      Tournament|   Month|     Level|
|All England Open|   March|Super 1000|
|   Malaysia Open| January| Super 750|
|      Korea Open|   April| Super 500|
|       Hylo Open|November| Super 100|
|   Spain Masters|   March| Super 300|

Row(Tournament='All England Open', Month='March', Level='Super 1000')
Super 1000
All England Open
Super 1000

Method 2 : Using asDict() method

We will create a Spark DataFrame with atleast one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the asDict() method to get a dictionary where column names are keys and their row values are dictionary values. Given below is the syntax:

Syntax : DataFrame.asDict(recursive)


recursive: bool : returns nested rows as dict. The default value is False.

We then get easily get the value from the dictionary using DictionaryName[‘key_name’].


# library imports are done here
import pyspark
from pyspark.sql import SparkSession
# Session Creation
random_value_session = SparkSession.builder.appName(
# Data filled in our DataFrame
# Rows below will be filled
rows = [['French Open', 'October', 'Super 750'],
        ['Macau Open', 'November', 'Super 300'],
        ['India Open', 'January', 'Super 500'],
        ['Odisha Open', 'January', 'Super 100'],
        ['China Open', 'November', 'Super 1000']]
# DataFrame Columns
columns = ['Tournament', 'Month', 'Level']
# DataFrame creation
dataframe = random_value_session.createDataFrame(rows,
# DataFrame print
# list of rows using collect()
row_list = dataframe.collect()
# Printing the second Row object
# from which we will read data
# Printing dictionary to make
# things more clear
# Using asDict() method to convert row object
# into a dictionary where the column names are keys
# Using column names as keys to get respective values

Output : 

| Tournament|   Month|     Level|
|French Open| October| Super 750|
| Macau Open|November| Super 300|
| India Open| January| Super 500|
|Odisha Open| January| Super 100|
| China Open|November|Super 1000|

Row(Tournament='Macau Open', Month='November', Level='Super 300')

{'Tournament': 'Macau Open', 'Month': 'November', 'Level': 'Super 300'}

Macau Open
Super 300

Method 3: Imagining Row object just like a list

Here we will imagine a Row object like a Python List and perform operations. We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). Since we are imagining the Row object like a List, we just use : 

Syntax : RowObject[‘Column_name’]

Returns : Value corresponding to the column name in the row object.


# library imports are done here
import pyspark
from pyspark.sql import SparkSession
# Session Creation
random_value_session = SparkSession.builder.appName(
# Data filled in our DataFrame
# Rows below will be filled
rows = [['Denmark Open', 'October', 'Super 1000'],
        ['Indonesia Open', 'June', 'Super 1000'],
        ['Korea Open', 'April', 'Super 500'],
        ['Japan Open', 'August', 'Super 750'],
        ['Akita Masters', 'July', 'Super 100']]
# DataFrame Columns
columns = ['Tournament', 'Month', 'Level']
# DataFrame creation
dataframe = random_value_session.createDataFrame(rows,
# DataFrame print
# list of rows using collect()
row_list = dataframe.collect()
# Lets take the third Row object
row_object = row_list[2]
# If we imagine it as a Python List,
# We can get the first value of the list,
# index 0, let's try it
# We got the value of column at index 0
# which is - 'Tournament'
# A few more examples


|    Tournament|  Month|     Level|
|  Denmark Open|October|Super 1000|
|Indonesia Open|   June|Super 1000|
|    Korea Open|  April| Super 500|
|    Japan Open| August| Super 750|
| Akita Masters|   July| Super 100|

Korea Open
Akita Masters
Super 100

Contact Us