PARTITION BY vs GROUP BY in SQL
In SQL both PARTITION BY and GROUP BY are important clauses used for data aggregation and analysis. Sometimes they work as same but they serve different purposes and are applied in different situations. In this article, we’ll understand both of them along with the syntax, multiple examples for both clauses and also the differences between them.
What is the PARTITION BY Clause?
The PARTITION BY clause is a type of clause that is used with window functions to divide the result set into partitions on which the function is applied. It allows us to perform calculations and aggregations within each partition independently.
Syntax:
SELECT column1, column2, ..., function(column) OVER (PARTITION BY column_name1, column_name2, ...)
FROM table_name;
Explanation: Here the function(column)
is the window function that we want to apply and column_name1, column_name
2
are the columns which are used to partition the result set.
What is GROUP BY Clause?
The GROUP BY clause is used to aggregate data based on one or more columns and group rows with identical values into summary rows. It is widely used via aggregate functions like SUM, COUNT, and AVG to perform calculations on each group.
Syntax:
SELECT column1, column2, aggregate_function(column3)
FROM table_name
WHERE condition
GROUP BY column1, column2;
Explanation:
column1
,column2
are the columns that we want to group by.aggregate_function
is the function likeSUM
,COUNT
,AVG
,MAX
,MIN
that we want to apply to the grouped data incolumn3
.table_name
is the name of the table.condition
is an optional condition to filter the rows before grouping.
Examples of PARTITION BY and GROUP BY
To understand PARTITION BY vs GROUP BY in SQL we need a table on which we will perform various operations and queries. Here we will consider a table called sales which contains product_id, category and sales_amount as Columns.
Query:
CREATE TABLE sales (
product_id INT,
category VARCHAR(50),
sales_amount DECIMAL(10, 2)
);
INSERT INTO sales (product_id, category, sales_amount) VALUES (1, 'Electronics' , 500);
INSERT INTO sales (product_id, category, sales_amount) VALUES (2, 'Clothing' , 300);
INSERT INTO sales (product_id, category, sales_amount) VALUES (3, 'Electronics' , 700);
INSERT INTO sales (product_id, category, sales_amount) VALUES (4, 'Clothing' , 400);
INSERT INTO sales (product_id, category, sales_amount) VALUES (5, 'Electronics' , 600);
Our sales table looks like:
Examples of GROUP BY Clause
Example 1: Total Sales Amount by Category
Query:
SELECT category, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY category;
Output
Explanation: In the above query, We uses GROUP BY
to group rows by category and SUM
()
to calculate the total sales amount for each category in the sales table.
Example 2: Average Sales Amount by Category
Query:
SELECT category, AVG(sales_amount) AS avg_sales
FROM sales
GROUP BY category;
Output:
Explanation: In the above query, We uses GROUP BY
to group rows by category and AVG
()
to calculate the average sales amount for each category in the sales table.
Examples of PARTITION BY Clause
Example 1: Rank of Products within Each Category
Query:
SELECT product_id, category, sales_amount,
RANK() OVER (PARTITION BY category ORDER BY sales_amount DESC) AS rank_within_category
FROM sales;
Output:
Explanation: In the above query, We calculates the rank of each product within its category based on the sales amount, using the RANK()
window function with PARTITION BY
to group products by category.
Example 2: Cumulative Sales Amount within Each Category
SELECT product_id, category, sales_amount,
SUM(sales_amount) OVER (PARTITION BY category ORDER BY product_id) AS cumulative_sales
FROM sales;
Output:
Explanation: In the above query, We calculates the cumulative sales amount for each product within its category, using the SUM
()
window function with PARTITION
BY
to group products by category and ordering by product_id
.
Difference Between PARTITION BY and GROUP BY
Let’s compare PARTITION BY and GROUP BY in a tabular format.
Feature |
PARTITION BY |
GROUP BY |
---|---|---|
Purpose |
It is Used with window functions for partitioning data |
It is Used for aggregating data based on one or more columns |
Aggregation |
It Performs calculations within each partition |
It Aggregates data across the entire result set |
Output |
It Retains all rows in the output |
It Aggregates rows and summarizes data into groups |
Result |
It gives Set Size Result set size may remain unchanged |
It gives Result set size may decrease due to aggregation |
Conclusion
Overall, Understanding the differences between PARTITION BY and GROUP BY is important for effective data analysis and aggregation in SQL. While GROUP BY is used for summarizing data into groups, PARTITION BY allows for more advanced calculations within each partition. We have saw various examples of PARTITION BY and GROUP BY to get the better understanding of them.
Contact Us