Building Sample DataFrames
Let us build two sample DataFrame to perform join upon in Scala.
import org.apache.spark.sql.SparkSession
object joindfs{
def main(args:Array[String]) {
val spark: SparkSession = SparkSession.builder().master("local[1]").getOrCreate()
val class_columns = Seq("Id", "Name")
val class_data = Seq((1, "Dhruv"), (2, "Akash"), (3, "Aayush"))
val class_df = spark.createDataFrame(class_data).toDF(class_columns:_*)
val result_column = Seq("Id", "Subject", "Score")
val result_data = Seq((1, "Maths", 98), (2, "Maths", 99), (3, "Maths", 94), (1, "Physics", 95), (2, "Physics", 97), (3, "Physics", 99))
val result_df = spark.createDataFrame(result_data).toDF(result_column:_*)
class_df.show()
result_df.show()
}
}
Output:
Explanation:
Here we have formed two dataframes.
- The first one is the class dataframe which contains the information about students in a classroom.
- The second one is the result dataframe which contains the marks of students in Maths and Physics.
- We will form a combined dataframe that will contain both student and result information.
Let us see how to join these dataframes now.
How to Join Two DataFrame in Scala?
Scala stands for scalable language. It is a statically typed language although unlike other statically typed languages like C, C++, or Java, it doesn’t require type information while writing the code. The type verification is done at the compile time. Static typing allows us to build safe systems by default. Smart built-in checks and actionable error messages, combined with thread-safe data structures and collections, prevent many tricky bugs before the program first runs.
Contact Us