Exploring Elasticsearch Cluster Architecture and Node Roles

Elasticsearch Search Engine | An introduction

Monitoring and Optimizing Your Elasticsearch Cluster

Elasticsearch’s cluster architecture and node roles are fundamental to building scalable and fault-tolerant search infrastructures. A cluster comprises interconnected nodes, each serving specific roles like master, data, ingest, or coordinating-only. Understanding these components is crucial for efficient cluster management and performance.

In this article, We will learn about the Elasticsearch Cluster Architecture, Node Roles in Elasticsearch, and Practical Examples in detail.

Elasticsearch Cluster Architecture

Elasticsearch clusters are built to be highly scalable and fault–tolerant and allowing them to handle large volumes of data and queries efficiently. The architecture of an Elasticsearch cluster consists of several key components:

Nodes: Nodes are individual instances of Elasticsearch running on a server. Each node can be configured to perform specific roles within the cluster, such as master-eligible, data, ingest or coordinating-only.
Master Node: The master node is responsible for cluster-wide management tasks, such as creating or deleting indices, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes.
Data Node: Data nodes are responsible for storing and managing the actual data in the cluster. They handle indexing requests, store data in shards and execute search queries. Data nodes can hold multiple primary and replica shards, distributing the data across the cluster for scalability and fault tolerance.
Ingest Node: Ingest nodes are used for pre-processing documents before they are indexed. They can apply transformations, enrichments, or other processing steps to the data. Ingest nodes help offload processing tasks from data nodes improving overall cluster performance.
Coordinating-Only Node: Coordinating-only nodes do not hold any data or participate in the master election process. Their main role is to act as a proxy for client requests, distributing search and indexing requests to the appropriate data nodes.
Shards: Shards are the basic units of data in Elasticsearch. Each index is divided into multiple shards, which can be distributed across the cluster. This allows Elasticsearch to parallelize operations and scale horizontally.
Replicas: Replicas are copies of shards that are distributed across the cluster. Replicas serve two main purposes: they improve search performance by allowing queries to be executed in parallel across multiple replicas and they provide fault tolerance by allowing data to be recovered from replicas if a primary shard fails.
Cluster State: The cluster state is a metadata repository that stores information about the cluster, including the index mapping settings and the location of shards. The cluster state is managed by the master node and is distributed to all nodes in the cluster.

Node Roles in Elasticsearch

Elasticsearch nodes can assume different roles based on their configurations and responsibilities within the cluster. The common node roles include:

1. Master-eligible Nodes

Master-eligible nodes participate in the election process to elect a master node responsible for cluster-wide management tasks.
They maintain cluster state, coordinate node additions or removals, and handle administrative actions like creating or deleting indices.
Typically, it’s recommended to have at least three master-eligible nodes for fault tolerance and to avoid split-brain scenarios.

2. Data Nodes

Data nodes store and manage indexed documents and handle data-related operations such as indexing, search, and retrieval.
They store shards (partitions of indices) and replicate data for fault tolerance.
Adding more data nodes increases the storage capacity and improves search performance by distributing the workload.

3. Ingest Nodes

Ingest nodes are responsible for preprocessing documents before indexing.
They can apply transformations, enrich data, or extract specific fields from incoming documents using ingest pipelines.
Ingest nodes are optional but useful for offloading preprocessing tasks from data and master nodes.

4. Client Nodes

Client nodes help route search and indexing requests to the right data nodes in the cluster.
They serve as a gateway for external clients, distributing requests evenly across data nodes for load balancing.
Client nodes help improve the scalability and resilience of the cluster by isolating query processing from data storage.

Practical Example

Let’s consider a medium-sized Elasticsearch cluster with 5 nodes:

3 Master-eligible Nodes
2 Data Nodes

1. Define Cluster Nodes

Let’s configure a node as a master-eligible node in an Elasticsearch cluster while ensuring it does not store data or preprocess documents

node.master: true
node.data: false
node.ingest: false

2. Add Data Nodes

Let’s configure a node as a data node in an Elasticsearch cluster, where the node stores data but does not act as a master or preprocess documents

node.master: false
node.data: true
node.ingest: false

3. Update Cluster Settings

Below Elasticsearch API call disables the disk space threshold for shard allocation in the cluster. When the disk threshold is disabled, Elasticsearch will not prevent shard allocation based on the available disk space on the nodes.

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.threshold_enabled": false
  }
}

4. Check Cluster Health

Below Elasticsearch API call retrieves the current health status of the cluster. The response includes information such as the cluster name, status (green, yellow, or red), number of nodes, number of data nodes, active and initializing shards, and more

GET /_cluster/health

5. Add Ingest Nodes

This below node configuration specifies a node that can preprocess documents (node.ingest: true) but cannot be elected as the master (node.master: false) or store data shards (node.data: false).

node.master: false
node.data: false
node.ingest: true

6. Update Index Settings

This below request sets the number of replicas for the “my_index” index to 1, meaning each primary shard will have one replica.

PUT /my_index/_settings
{
  "settings": {
    "number_of_replicas": 1
  }
}

7. Verify Cluster State

This below request retrieves the current state of the cluster, including information about the nodes, indices, shards, and cluster settings.

GET /_cluster/state

Conclusion

Overall, Elasticsearch’s cluster architecture and node roles play a pivotal role in the efficient management and scalability of search infrastructures. By understanding the roles of master, data, ingest, and coordinating-only nodes, organizations can optimize their cluster configurations for specific use cases and workloads.

The practical examples provided offer a clear guide on how to configure nodes, update settings, and manage cluster health, making it easier for administrators and developers to deploy and maintain Elasticsearch clusters effectively.

Tags:

#Databases #Elasticsearch