Conversion Using toDF() Implicit Method
Another common way to turn RDDs into DataFrames is by using the .toDF() method. But before we start, we need to make sure we import the necessary things from the SparkSession.
This helps us work with DataFrames smoothly:
Below is the Code provided:
import org.apache.spark.sql.{SparkSession, Row}
import org.apache.spark.sql.types._
object RDDToDataFrame {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.appName("RDD to DataFrame")
.master("local[*]")
.getOrCreate()
val data = Seq(
("John", 30),
("Alice", 25),
("Bob", 35)
)
import spark.implicits._
val df = data.toDF("Name", "Age")
df.show()
spark.stop()
}
}
With that in place, we are all set to convert our RDD. However, it is important to note that this method is designed to handle specific types of RDDs, including Int, Long, String, or any subclasses of scala.Product. Suppose we have an RDD constructed using a sequence of Tuples. In that case, we can utilize our imported implicit method as follows:
Below is the Code provided:
val dfUsingToDFMethod = rdd.toDF("EmployeeName", "Department", "Salary")
Now, let’s take a peek at the schema of our freshly minted DataFrame:
Below is the Code provided:
dfUsingToDFMethod.printSchema()
Upon execution, this will display:
Output:
This showcases the schema structure of our DataFrame, including the names and data types of its columns.
How to Convert RDD to Dataframe in Spark Scala?
This article focuses on discussing ways to convert rdd to dataframe in Spark Scala.
Table of Content
- RDD and DataFrame in Spark
- Convert Using createDataFrame Method
- Conversion Using toDF() Implicit Method
- Conclusion
- FAQs
Contact Us