How to use spark.read.csv() In Python

It is used to load text files into DataFrame. Using this method we will go through the input once to determine the input schema if inferSchema is enabled. To avoid going through the entire data once, disable inferSchema option or specify the schema explicitly using the schema.

Syntax: spark.read.csv(path)

Returns: DataFrame

Example: Read text file using spark.read.csv().

First, import the modules and create a spark session and then read the file with spark.read.csv(), then create columns and split the data from the txt file show into a dataframe.

Python3

from pyspark.sql import SparkSession 
  
spark = SparkSession.builder.getOrCreate() 
  
df = spark.read.csv("output.txt") 
  
df.selectExpr("split(_c0, ' ')\ 
as Text_Data_In_Rows_Using_CSV").show(4,False)

Output:

Read Text file into PySpark Dataframe

In this article, we are going to see how to read text files in PySpark Dataframe.

There are three ways to read text files into PySpark DataFrame.

Using spark.read.text()
Using spark.read.csv()
Using spark.read.format().load()

Using these we can read a single text file, multiple files, and all files from a directory into Spark DataFrame and Dataset.

Text file Used:

Tags:

#Python-Pyspark #Python #python

Method 1: Using spark.read.text()

Method 3: Using spark.read.format()

How to use spark.read.csv() In Python

Python3

Read Text file into PySpark Dataframe

Similar Reads

Contact Us