How to check dataframe size in Scala?
In this article, we will learn how to check dataframe size in Scala. To check the size of a DataFrame in Scala, you can use the count()
function, which returns the number of rows in the DataFrame.
Here’s how you can do it:
Syntax:
val size = dataframe.count()
Example #1:
import org.apache.spark.sql.{DataFrame, SparkSession}
object DataFrameSizeCheck {
def main(args: Array[String]): Unit = {
// Create SparkSession
val spark = SparkSession.builder()
.appName("DataFrameSizeCheck")
.master("local[*]")
.getOrCreate()
// Sample DataFrame (replace this with your actual DataFrame)
val dataframe: DataFrame = spark.emptyDataFrame
// Get the size of DataFrame
val size = dataframe.count()
// Print the size
println(s"DataFrame size: $size")
// Stop SparkSession
spark.stop()
}
}
Output:
DataFrame size: 0
Explanation:
- We create a SparkSession.
- Define a sample DataFrame. You should replace this with your actual DataFrame.
- We use the
count()
function to get the size of the DataFrame, i.e., the number of rows it contains.
Example #2:
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
object DataFrameSizeCheck {
def main(args: Array[String]): Unit = {
// Create SparkSession
val spark = SparkSession.builder()
.appName("DataFrameSizeCheck")
.master("local[*]")
.getOrCreate()
// Sample data for DataFrame
val data = Seq(
(1, "John"),
(2, "Alice"),
(3, "Bob")
)
// Define the schema
val schema = StructType(
Seq(
StructField("ID", IntegerType, nullable = false),
StructField("Name", StringType, nullable = false)
)
)
// Create DataFrame
val dataframe: DataFrame = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
// Get the size of DataFrame
val size = dataframe.count()
// Print the size
println(s"DataFrame size: $size")
// Stop SparkSession
spark.stop()
}
}
Output:
DataFrame size: 3
Explanation:
- We create a SparkSession.
- Define some sample data in the form of tuples.
- Define the schema for the DataFrame, specifying the data types of each column.
- Create a DataFrame using the
createDataFrame
method and passing the sample data and schema. - Use the
count()
function to get the size of the DataFrame. - Print the size of the DataFrame.
- Finally, we stop the SparkSession.
Contact Us