How to use Sql query In Python
Here we will use sql query inside the Pyspark, We will create a temp view of the table with the help of createTempView() and the life of this temp is up to the life of the sparkSession. registerTempTable() will create the temp table if it is not available or if it is available then replace it.
Then after creating the table select the table by SQL clause which will take all the values as a string.
Python3
df.registerTempTable( 'table' ) newDF = spark.sql( 'select *, 1 as newCol from table' ) newDF.show() |
Output:
How to add a constant column in a PySpark DataFrame?
In this article, we are going to see how to add a constant column in a PySpark Dataframe.
It can be done in these ways:
- Using Lit()
- Using Sql query.
Creating Dataframe for demonstration:
Python3
# Create a spark session from pyspark.sql import SparkSession from pyspark.sql.functions import lit spark = SparkSession.builder.appName( 'SparkExamples' ).getOrCreate() # Create a spark dataframe columns = [ "Name" , "Course_Name" , "Months" , "Course_Fees" , "Discount" , "Start_Date" , "Payment_Done" ] data = [ ( "Amit Pathak" , "Python" , 3 , 10000 , 1000 , "02-07-2021" , True ), ( "Shikhar Mishra" , "Soft skills" , 2 , 8000 , 800 , "07-10-2021" , False ), ( "Shivani Suvarna" , "Accounting" , 6 , 15000 , 1500 , "20-08-2021" , True ), ( "Pooja Jain" , "Data Science" , 12 , 60000 , 900 , "02-12-2021" , False ), ] df = spark.createDataFrame(data).toDF( * columns) # View the dataframe df.show() |
Output:
Contact Us