Bucket Aggregation in Elasticsearch

Elasticsearch is a robust tool not only for full-text search but also for data analytics. One of the core features that make Elasticsearch powerful is its aggregation framework, particularly bucket aggregations. Bucket aggregations allow you to group documents into buckets based on certain criteria, making it easier to analyze and summarize your data.

This article will explain what bucket aggregations are, how they work, and provide detailed examples to help you understand their usage.

What are Bucket Aggregations?

Bucket aggregations in Elasticsearch are used to group documents into different buckets based on specified criteria. Each bucket can contain multiple documents that match the criteria. Unlike metric aggregations, which calculate metrics on numeric fields, bucket aggregations focus on grouping data.

Bucket aggregations can be combined with metric aggregations to perform complex analytics. For instance, you can group documents by a field (like category) and then calculate the average price within each group.

Types of Bucket Aggregations

Elasticsearch provides several types of bucket aggregations, each suited for different grouping scenarios:

  • Terms Aggregation: Groups documents based on unique values in a field.
  • Histogram Aggregation: Groups numeric data into buckets of a fixed size.
  • Date Histogram Aggregation: Groups date values into buckets of a fixed time interval.
  • Range Aggregation: Groups numeric data into custom ranges.
  • Date Range Aggregation: Groups date values into custom date ranges.
  • Filter Aggregation: Groups documents that match a specific filter.
  • Filters Aggregation: Groups documents based on multiple filters.
  • Significant Terms Aggregation: Finds unusual terms in a set of documents.
  • Geohash Grid Aggregation: Groups geo-point data into geohash cells.

Example Dataset

To illustrate bucket aggregations, let’s consider an Elasticsearch index called products with documents like this:

{
"product_id": 1,
"name": "Laptop",
"category": "electronics",
"price": 1000,
"quantity_sold": 5,
"rating": 4.5,
"sold_date": "2023-01-15"
}

Terms Aggregation

The term aggregation groups documents based on the unique values in a field. Let’s group products by their category.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category.keyword"
}
}
}
}

Output

{
"aggregations": {
"categories": {
"buckets": [
{
"key": "electronics",
"doc_count": 5
},
{
"key": "clothing",
"doc_count": 3
},
{
"key": "books",
"doc_count": 2
}
]
}
}
}

In this example, products are grouped by category, and the number of products in each category is counted.

Histogram Aggregation

The histogram aggregation groups numeric values into buckets of a specified interval. Let’s group products by price ranges with an interval of $100.

Query:

GET /products/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"histogram": {
"field": "price",
"interval": 100
}
}
}
}

Output:

{
"aggregations": {
"price_ranges": {
"buckets": [
{
"key": 0,
"doc_count": 2
},
{
"key": 100,
"doc_count": 3
},
{
"key": 200,
"doc_count": 1
},
{
"key": 300,
"doc_count": 4
}
]
}
}
}

In this example, products are grouped into price ranges with an interval of $100, and the number of products in each range is counted.

Date Histogram Aggregation

The date histogram aggregation groups date values into buckets of a fixed time interval. Let’s group products by the month they were sold.

Query:

GET /products/_search
{
"size": 0,
"aggs": {
"sales_by_month": {
"date_histogram": {
"field": "sold_date",
"calendar_interval": "month"
}
}
}
}

Output:

{
"aggregations": {
"sales_by_month": {
"buckets": [
{
"key_as_string": "2023-01-01T00:00:00.000Z",
"key": 1672531200000,
"doc_count": 5
},
{
"key_as_string": "2023-02-01T00:00:00.000Z",
"key": 1675209600000,
"doc_count": 3
},
{
"key_as_string": "2023-03-01T00:00:00.000Z",
"key": 1677628800000,
"doc_count": 2
}
]
}
}
}

In this example, products are grouped by the month they were sold, and the number of products sold each month is counted.

Range Aggregation

The range aggregation groups numeric values into custom ranges. Let’s group products by custom price ranges.

Query:

GET /products/_search
{
"size": 0,
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"ranges": [
{ "to": 100 },
{ "from": 100, "to": 500 },
{ "from": 500 }
]
}
}
}
}

Output:

{
"aggregations": {
"price_ranges": {
"buckets": [
{
"key": "*-100.0",
"to": 100.0,
"doc_count": 2
},
{
"key": "100.0-500.0",
"from": 100.0,
"to": 500.0,
"doc_count": 4
},
{
"key": "500.0-*",
"from": 500.0,
"doc_count": 4
}
]
}
}
}

In this example, products are grouped into custom price ranges, and the number of products in each range is counted.

Date Range Aggregation

The date range aggregation groups date values into custom ranges. Let’s group products by custom-sold date ranges.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"sales_date_ranges": {
"date_range": {
"field": "sold_date",
"ranges": [
{ "to": "2023-01-01" },
{ "from": "2023-01-01", "to": "2023-02-01" },
{ "from": "2023-02-01" }
]
}
}
}
}

Output

{
"aggregations": {
"sales_date_ranges": {
"buckets": [
{
"key": "*-2023-01-01T00:00:00.000Z",
"to": 1672531199999,
"doc_count": 2
},
{
"key": "2023-01-01T00:00:00.000Z-2023-02-01T00:00:00.000Z",
"from": 1672531200000,
"to": 1675209599999,
"doc_count": 5
},
{
"key": "2023-02-01T00:00:00.000Z-*",
"from": 1675209600000,
"doc_count": 3
}
]
}
}
}

In this example, products are grouped into custom-sold date ranges, and the number of products sold in each range is counted.

Filter Aggregation

The filter aggregation groups documents that match a specific filter. Let’s group products that have a rating of 4 or higher.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"high_rated_products": {
"filter": {
"range": {
"rating": {
"gte": 4
}
}
}
}
}
}

Output

{
"aggregations": {
"high_rated_products": {
"doc_count": 7
}
}
}

In this example, we group products with a rating of 4 or higher, and the number of such products is counted.

Filters Aggregation

The filter aggregation groups documents based on multiple filters. Let’s group products based on multiple rating ranges.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"rating_filters": {
"filters": {
"filters": {
"high_rating": {
"range": {
"rating": {
"gte": 4
}
}
},
"low_rating": {
"range": {
"rating": {
"lt": 4
}
}
}
}
}
}
}
}

Output

{
"aggregations": {
"rating_filters": {
"buckets": {
"high_rating": {
"doc_count": 7
},
"low_rating": {
"doc_count": 3
}
}
}
}
}

In this example, products are grouped based on two rating ranges, and the number of products in each range is counted.

Significant Terms Aggregation

The significant terms aggregation finds unusual terms in a set of documents. Let’s find significant terms in product names in the electronics category.

Query

GET /products/_search
{
"query": {
"term": {
"category.keyword": "electronics"
}
},
"size": 0,
"aggs": {
"significant_terms": {
"significant_terms": {
"field": "name.keyword"
}
}
}
}

Output

{
"aggregations": {
"significant_terms": {
"buckets": [
{
"key": "laptop",
"doc_count": 3,
"score": 0.5,
"bg_count": 1
},
{
"key": "tablet",
"doc_count": 2,
"score": 0.3,
"bg_count": 1
}
]
}
}
}

In this example, significant terms in product names in the electronics category are identified, with their document counts and significance scores.

Geohash Grid Aggregation

The geohash grid aggregation groups geo-point data into geohash cells. Let’s group products by their location.

Query

GET /products/_search
{
"size": 0,
"aggs": {
"geo_grid": {
"geohash_grid": {
"field": "location",
"precision": 5
}
}
}
}

Output

{
"aggregations": {
"geo_grid": {
"buckets": [
{
"key": "dr5ru",
"doc_count": 3
},
{
"key": "dr5r6",
"doc_count": 2
}
]
}
}
}

In this example, products are grouped by their location into geohash cells with a precision of 5.

Conclusion

Bucket aggregations in Elasticsearch are a powerful tool for grouping and analyzing data based on various criteria. By understanding and using different types of bucket aggregations, you can perform complex analytics and gain valuable insights into your data. Whether you’re analyzing sales data, user behavior, or any other type of information, bucket aggregations provide a flexible and efficient way to summarize and explore your data in Elasticsearch.



Contact Us