Conversion Using toDF() Implicit Method

Another common way to turn RDDs into DataFrames is by using the .toDF() method. But before we start, we need to make sure we import the necessary things from the SparkSession.

This helps us work with DataFrames smoothly:

Below is the Code provided:

Scala

import org.apache.spark.sql.{SparkSession, Row}
import org.apache.spark.sql.types._

object RDDToDataFrame {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .appName("RDD to DataFrame")
      .master("local[*]")
      .getOrCreate()

    val data = Seq(
      ("John", 30),
      ("Alice", 25),
      ("Bob", 35)
    )

    import spark.implicits._
    val df = data.toDF("Name", "Age")

    df.show()
    spark.stop()
  }
}

With that in place, we are all set to convert our RDD. However, it is important to note that this method is designed to handle specific types of RDDs, including Int, Long, String, or any subclasses of scala.Product. Suppose we have an RDD constructed using a sequence of Tuples. In that case, we can utilize our imported implicit method as follows:

Below is the Code provided:

Scala

val dfUsingToDFMethod = rdd.toDF("EmployeeName", "Department", "Salary")

Now, let’s take a peek at the schema of our freshly minted DataFrame:

Below is the Code provided:

Scala

dfUsingToDFMethod.printSchema()

Upon execution, this will display:

Output:

This showcases the schema structure of our DataFrame, including the names and data types of its columns.

How to Convert RDD to Dataframe in Spark Scala?

This article focuses on discussing ways to convert rdd to dataframe in Spark Scala.

Table of Content

RDD and DataFrame in Spark
Convert Using createDataFrame Method
Conversion Using toDF() Implicit Method
Conclusion
FAQs

Tags:

#Scala

Convert Using createDataFrame Method

Conclusion

Conversion Using toDF() Implicit Method

How to Convert RDD to Dataframe in Spark Scala?

Similar Reads

Contact Us