Building Sample DataFrames

Let us build two sample DataFrame to perform join upon in Scala.

Scala

import org.apache.spark.sql.SparkSession

object joindfs{
  def main(args:Array[String]) {
    val spark: SparkSession = SparkSession.builder().master("local[1]").getOrCreate()

    val class_columns = Seq("Id", "Name")
    val class_data    = Seq((1, "Dhruv"), (2, "Akash"), (3, "Aayush"))
    val class_df = spark.createDataFrame(class_data).toDF(class_columns:_*)

    val result_column = Seq("Id", "Subject", "Score")
    val result_data   = Seq((1, "Maths", 98), (2, "Maths", 99), (3, "Maths", 94), (1, "Physics", 95), (2, "Physics", 97), (3, "Physics", 99))
    val result_df = spark.createDataFrame(result_data).toDF(result_column:_*)

    class_df.show()
    result_df.show()
  }
}

Output:

class_df

result_df

Explanation:

Here we have formed two dataframes.

The first one is the class dataframe which contains the information about students in a classroom.
The second one is the result dataframe which contains the marks of students in Maths and Physics.
We will form a combined dataframe that will contain both student and result information.

Let us see how to join these dataframes now.

How to Join Two DataFrame in Scala?

Scala stands for scalable language. It is a statically typed language although unlike other statically typed languages like C, C++, or Java, it doesn’t require type information while writing the code. The type verification is done at the compile time. Static typing allows us to build safe systems by default. Smart built-in checks and actionable error messages, combined with thread-safe data structures and collections, prevent many tricky bugs before the program first runs.

Tags:

#Scala

Understanding Dataframe and Spark

Joining DataFrames

Building Sample DataFrames

Output:

How to Join Two DataFrame in Scala?

Similar Reads

Contact Us