Aggregation in MongoDB

Aggregation in MongoDB is a powerful feature that allows for complex data transformations and computations on collections of documents. It enables users to group, filter, and manipulate data to produce summarized results.

It is typically performed using the MongoDB Aggregation Pipeline which is a framework for data aggregation modeled on the concept of data processing pipelines. In this article, We will learn about Aggregation in MongoDB in detail by covering various aspects related to MongoDB Aggregation.

What is aggregation?

  • MongoDB Aggregation is a database process that allows us to perform complex data transformations and computations on collections of documents or rows.
  • It enables us to group, filter, and manipulate data to produce summarized results. MongoDB Aggregation is typically carried out using the aggregation pipeline, which is a framework for data aggregation modeled on the concept of data processing pipelines.
  • Each stage of the pipeline transforms the documents as they pass through it and allowing for operations like filtering, grouping, sorting, reshaping and performing calculations on the data.

Single-purpose aggregation

  • It is used when we need simple access to document like counting the number of documents or for finding all distinct values in a document.
  • It simply provides the access to the common aggregation process using the count(), distinct() and estimatedDocumentCount() methods so due to which it lacks the flexibility and capabilities of the pipeline.

Example of Single-purpose aggregation

Let’s consider a single-purpose aggregation example where we find the total number of users in each city from the users collection.

db.users.aggregate([
{ $group: { _id: "$city", totalUsers: { $sum: 1 } } }
])

Output:

[
{ _id: 'Los Angeles', totalUsers: 1 },
{ _id: 'New York', totalUsers: 1 },
{ _id: 'Chicago', totalUsers: 1 }
]

In this example, the aggregation pipeline first groups the documents by the city field and then uses the $sum accumulator to count the number of documents (users) in each city.

The result will be a list of documents, each containing the city (_id) and the total number of users (totalUsers) in that city.

How to use MongoDB to Aggregate Data?

To use MongoDB for aggregating data, follow below steps:

  1. Connect to MongoDB: Ensure you are connected to your MongoDB instance.
  2. Choose the Collection: Select the collection you want to perform aggregation on, such as students.
  3. Define the Aggregation Pipeline: Create an array of stages, like $group to group documents and perform operations (e.g., calculate the average grade).
  4. Run the Aggregation Pipeline: Use the aggregate method on the collection with your defined pipeline.

Example:

db.students.aggregate([
{
$group: {
_id: null,
averageGrade: { $avg: "$grade" }
}
}
])

This calculates the average grade of all students in the students collection.

Mongodb Aggregation Pipeline

  • Mongodb Aggregation Pipeline consist of stages and each stage transforms the document. It is a multi-stage pipeline and in each state and the documents are taken as input to produce the resultant set of documents.
  • In the next stage (ID available) the resultant documents are taken as input to produce output, this process continues till the last stage.
  • The basic pipeline stages are defined below:
    1. filters that will operate like queries.
    2. the document transformation that modifies the resultant document.
    3. provide pipeline provides tools for grouping and sorting documents.
  • Aggregation pipeline can also be used in sharded collection.

Let us discuss the aggregation pipeline with the help of an example:

Explanation:

In the above example of a collection of “train fares”. $match stage filters the documents by the value in class field i.e. class: “first-class” in the first stage and passes the document to the second stage.

In the Second Stage, the $group stage groups the documents by the id field to calculate the sum of fare for each unique id.

Here, the aggregate() function is used to perform aggregation. It can have three operators stages , expression and accumulator. These operators work together to achieve final desired outcome.

Aggregation Pipeline Method

To understand Aggregation Pipeline Method Let’s imagine a collection named users with some documents for our examples.

{
"_id": ObjectId("60a3c7e96e06f64fb5ac0700"),
"name": "Alice",
"age": 30,
"email": "alice@example.com",
"city": "New York"
}
{
"_id": ObjectId("60a3c7e96e06f64fb5ac0701"),
"name": "Bob",
"age": 35,
"email": "bob@example.com",
"city": "Los Angeles"
}
{
"_id": ObjectId("60a3c7e96e06f64fb5ac0702"),
"name": "Charlie",
"age": 25,
"email": "charlie@example.com",
"city": "Chicago"
}

1. $group: It Groups documents by the city field and calculates the average age using the $avg accumulator.

db.users.aggregate([
{ $group: { _id: "$city", averageAge: { $avg: "$age" } } }
])

Output:

[
{ _id: 'New York', averageAge: 30 },
{ _id: 'Chicago', averageAge: 25 },
{ _id: 'Los Angeles', averageAge: 35 }
]

2. $project: Include or exclude fields from the output documents.

db.users.aggregate([
{ $project: { name: 1, city: 1, _id: 0 } }
])

Output:

[
{ name: 'Alice', city: 'New York' },
{ name: 'Bob', city: 'Los Angeles' },
{ name: 'Charlie', city: 'Chicago' }
]

$match: Filter documents to pass only those that match the specified condition(s).

db.users.aggregate([
{ $match: { age: { $gt: 30 } } }
])

Output:

[
{
_id: ObjectId('60a3c7e96e06f64fb5ac0701'),
name: 'Bob',
age: 35,
email: 'bob@example.com',
city: 'Los Angeles'
}
]

3. $sort: It Order the documents.

db.users.aggregate([
{ $sort: { age: 1 } }
])

Output:

[
{
_id: ObjectId('60a3c7e96e06f64fb5ac0702'),
name: 'Charlie',
age: 25,
email: 'charlie@example.com',
city: 'Chicago'
},
{
_id: ObjectId('60a3c7e96e06f64fb5ac0700'),
name: 'Alice',
age: 30,
email: 'alice@example.com',
city: 'New York'
},
{
_id: ObjectId('60a3c7e96e06f64fb5ac0701'),
name: 'Bob',
age: 35,
email: 'bob@example.com',
city: 'Los Angeles'
}
]

4. $limit: Limit the number of documents passed to the next stage.

db.users.aggregate([
{ $limit: 2 }
])

Output:

[
{
_id: ObjectId('60a3c7e96e06f64fb5ac0700'),
name: 'Alice',
age: 30,
email: 'alice@example.com',
city: 'New York'
},
{
_id: ObjectId('60a3c7e96e06f64fb5ac0701'),
name: 'Bob',
age: 35,
email: 'bob@example.com',
city: 'Los Angeles'
}
]

How Fast is MongoDB Aggregation?

  • The speed of MongoDB aggregation depends on various factors such as the complexity of the aggregation pipeline, the size of the data set, the hardware specifications of the MongoDB server and the efficiency of the indexes.
  • In general, MongoDB’s aggregation framework is designed to efficiently process large volumes of data and complex aggregation operations. When used correctly it can provide fast and scalable aggregation capabilities.
  • So with any database operation, the performance can vary based on the specific use case and configuration. It is important to optimize our aggregation queries and use indexes where appropriate and ensure that our MongoDB server is properly configured for optimal performance.

Conclusion

MongoDB Aggregation is a powerful tool for performing complex data transformations and computations. It allows for flexible and efficient aggregation operations using the aggregation pipeline. Single-purpose aggregation provides access to common aggregation processes for simple tasks, while the aggregation pipeline offers more flexibility and capabilities for complex operations



Contact Us