How to Restrict Results to Top N Rows per Group in MySQL
When working with MySQL databases, the need to limit results to the top N rows per group is a common requirement. This is particularly useful in scenarios where you aim to retrieve a specific number of records for each distinct group in your dataset. Achieving this involves leveraging the ROW_NUMBER() window function along with the PARTITION BY clause to effectively partition the data into groups.
How to Restrict Results to Top N Rows per Group in MySQL
When dealing with databases, limiting results to the top N rows within specific groups is crucial. We’ll explore three approaches in MySQL to efficiently handle this scenario.
- Using Subquery with
ROW_NUMBER()
- Using Common Table Expression (CTE)
- Using
DENSE_RANK()
for Tie-Breaking
Syntax:
SELECT *
FROM (
SELECT
your_columns,
ROW_NUMBER() OVER (PARTITION BY your_grouping_column ORDER BY your_ordering_column DESC) AS row_num
FROM
your_table
) ranked
WHERE
row_num <= N;
Using ROW_NUMBER() and PARTITION BY
The ROW_NUMBER() function, combined with PARTITION BY, allows efficient grouping and filtering based on assigned row numbers.
SELECT
your_columns,
ROW_NUMBER() OVER (PARTITION BY your_grouping_column ORDER BY your_ordering_column DESC) AS row_num
FROM
your_table;
Example 1: Restricting Sales Data to the Top 2 Products per Category
Let’s consider a scenario where you have a table named sales with columns product, category, and revenue. The goal is to retrieve the top 2 products based on revenue for each category.
-- Sample Data
CREATE TABLE sales (
product VARCHAR(50),
category VARCHAR(50),
revenue INT
);
INSERT INTO sales VALUES ('Laptop', 'Electronics', 1000);
INSERT INTO sales VALUES ('Smartphone', 'Electronics', 1200);
INSERT INTO sales VALUES ('Refrigerator', 'Appliances', 800);
INSERT INTO sales VALUES ('Washing Machine', 'Appliances', 900);
INSERT INTO sales VALUES ('Tablet', 'Electronics', 500);
INSERT INTO sales VALUES ('Coffee Maker', 'Appliances', 600);
-- Query to Retrieve Top 2 Products per Category
SELECT
product,
category,
revenue
FROM (
SELECT
product,
category,
revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS row_num
FROM
sales
) ranked
WHERE
row_num <= 2;
Output:
product |
category |
revenue |
---|---|---|
Smartphone |
Electronics |
1200 |
Laptop |
Electronics |
1000 |
Washing Machine |
Appliances |
900 |
Refrigerator |
Appliances |
800 |
Explanation: The query retrieves the top 2 products per category from the ‘sales’ table based on descending revenue. The output displays columns ‘product‘, ‘category‘, and ‘revenue‘. It utilizes the ROW_NUMBER() window function to assign row numbers within each category, and filters rows where the row number is 1 or 2.
Example 2: Retrieving Top 3 Employees per Department Based on Salary
Consider a table named employees with columns employee_id, department, and salary. You want to retrieve the top 3 employees based on salary for each department.
-- Sample Data
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
department VARCHAR(50),
salary INT
);
INSERT INTO employees VALUES (1, 'HR', 50000);
INSERT INTO employees VALUES (2, 'IT', 60000);
INSERT INTO employees VALUES (3, 'HR', 55000);
INSERT INTO employees VALUES (4, 'IT', 65000);
INSERT INTO employees VALUES (5, 'Finance', 70000);
-- Query to Retrieve Top 3 Employees per Department
SELECT
employee_id,
department,
salary
FROM (
SELECT
employee_id,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM
employees
) ranked
WHERE
row_num <= 3;
Output:
employee_id |
department |
salary |
---|---|---|
2 |
IT |
60000 |
4 |
IT |
65000 |
5 |
Finance |
70000 |
1 |
HR |
50000 |
3 |
HR |
55000 |
Explanation: The output displays the top 3 employees with the highest salaries in each department. It includes their ‘employee_id‘, ‘department‘, and ‘salary’. The result is obtained by ranking employees within each department based on salary in descending order using ROW_NUMBER().
Using Common Table Expression (CTE)
Utilizing a Common Table Expression enhances readability and provides a clear structure for the query.
Syntax:
WITH ranked_data AS (
SELECT
your_columns,
ROW_NUMBER() OVER (PARTITION BY your_grouping_column ORDER BY your_ordering_column DESC) AS row_num
FROM
your_table
)
SELECT *
FROM ranked_data
WHERE row_num <= N;
Example
WITH ranked_data AS (
SELECT
product,
category,
revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS row_num
FROM
sales
)
SELECT *
FROM ranked_data
WHERE row_num <= 2;
Output:
| product | category | revenue | row_num |
|------------------|-------------|---------|---------|
| Smartphone | Electronics | 1200 | 1 |
| Laptop | Electronics | 1000 | 2 |
| Washing Machine | Appliances | 900 | 1 |
| Refrigerator | Appliances | 800 | 2 |
| Coffee Maker | Appliances | 600 | 3 |
Using DENSE_RANK()
Use DENSE_RANK() instead of ROW_NUMBER() for scenarios where tied rows should share the same rank.
Syntax:
SELECT
your_columns,
DENSE_RANK() OVER (PARTITION BY your_grouping_column ORDER BY your_ordering_column DESC) AS rank
FROM
your_table;
Example:
SELECT
product,
category,
revenue,
DENSE_RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rank
FROM
sales;
Output:
| product | category | revenue | rank |
|------------------|-------------|---------|------|
| Smartphone | Electronics | 1200 | 1 |
| Laptop | Electronics | 1000 | 2 |
| Tablet | Electronics | 500 | 3 |
| Washing Machine | Appliances | 900 | 1 |
| Refrigerator | Appliances | 800 | 2 |
| Coffee Maker | Appliances | 600 | 3 |
The result shows the products, their respective categories, revenues, and the calculated ranks. Adjust the column names and table accordingly for your specific use case.
Conclusion
So, overall to Restricting results to the top N rows per group in MySQL is a powerful capability that can be achieved by using the ROW_NUMBER() window function along with PARTITION BY. This technique allows you to efficiently filter and retrieve specific records for each distinct group in your data, providing valuable insights into top-performing items or individuals within each category. Whether it’s sales data, employee records, or any other dataset, the flexibility of this approach makes it a valuable tool for data analysis and reporting in MySQL.
Contact Us