Understanding SparkSession

The SparkSession class is the entry point into all functionality in Spark. It was introduced in Spark 2.0. It serves as a bridge to access all of Spark’s core features, encompassing RDDs, DataFrames, and Datasets, offering a cohesive interface for handling structured data processing. When developing a Spark SQL application, it is typically one of the initial objects you instantiate.

Let us dive deeper and better understand what SparkSession means.

  • The SparkSession amalgamates various previously distinct contexts, including SparkContext, SQLContext, HiveContext and StreamingContext, into a unified entry point, streamlining interaction with Spark and its diverse APIs.
  • This facilitates users in conducting a range of tasks such as reading data from multiple sources, executing SQL queries, generating DataFrames and Datasets, and efficiently performing actions on distributed datasets.
  • SparkSession starting from Spark 2.0 provides builtin support for Hive features including the ability to write queries using HiveQL, access to Hive UDFs, and the ability to read data from Hive tables.

How to create Spark session in Scala?

Scala stands for scalable language. It was developed in 2003 by Martin Odersky. It is an object-oriented language that provides support for functional programming approach as well. Everything in scala is an object e.g. – values like 1,2 can invoke functions like toString(). Scala is a statically typed language although unlike other statically typed languages like C, C++, or Java, it doesn’t require type information while writing the code. The type verification is done at the compile time. Static typing allows to building of safe systems by default. Smart built-in checks and actionable error messages, combined with thread-safe data structures and collections, prevent many tricky bugs before the program first runs.

Similar Reads

Understanding Spark

The official definition of Spark on its website is “Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters”....

Understanding SparkSession

The SparkSession class is the entry point into all functionality in Spark. It was introduced in Spark 2.0. It serves as a bridge to access all of Spark’s core features, encompassing RDDs, DataFrames, and Datasets, offering a cohesive interface for handling structured data processing. When developing a Spark SQL application, it is typically one of the initial objects you instantiate....

Creating SparkSession

Method 1 – Using builder API...

Contact Us