How to use withColumns() In Python

It is used to change the value, convert the datatype of an existing column, create a new column, and many more.

Syntax: df.withColumn(colName, col)

Returns: A new :class:`DataFrame` by adding a column or replacing the existing column that has the same name.

Python3

new_df = df.withColumn('After_discount', 
                       df.Course_Fees - df.Discount) 
new_df.show()

Output:

PySpark dataframe add column based on other columns

In this article, we are going to see how to add columns based on another column to the Pyspark Dataframe.

Creating Dataframe for demonstration:

Here we are going to create a dataframe from a list of the given dataset.

Python3

# Create a spark session 
from pyspark.sql import SparkSession 
spark = SparkSession.builder.appName('SparkExamples').getOrCreate() 
  
# Create a spark dataframe 
columns = ["Name", "Course_Name", 
           "Months", 
           "Course_Fees", "Discount", 
           "Start_Date", "Payment_Done"] 
data = [ 
    ("Amit Pathak", "Python", 3, 10000, 1000, 
     "02-07-2021", True), 
    ("Shikhar Mishra", "Soft skills", 2, 
     8000, 800, "07-10-2021", False), 
    ("Shivani Suvarna", "Accounting", 6, 
     15000, 1500, "20-08-2021", True), 
    ("Pooja Jain", "Data Science", 12, 
     60000, 900, "02-12-2021", False), 
] 
  
df = spark.createDataFrame(data).toDF(*columns) 
  
# View the dataframe 
df.show() 

Output:

Tags:

#Python-Pyspark #Python #python

Method 2: Using SQL query

How to use withColumns() In Python

Python3

PySpark dataframe add column based on other columns

Python3

Similar Reads

Contact Us