Setting Up the Ingest Pipeline

Required Plugin: Ingest Attachment Processor Plugin

An ingest pipeline allows you to preprocess documents before indexing them. For attachments, the pipeline will use the attachment processor to extract and index the content and metadata.

Step 1: Define the Ingest Pipeline

Create an ingest pipeline named attachment_pipeline:

curl -X PUT "localhost:9200/_ingest/pipeline/attachment_pipeline" -H 'Content-Type: application/json' -d'
{
  "description": "Extract attachment information",
  "processors": [
    {
      "attachment": {
        "field": "data"
      }
    },
    {
      "remove": {
        "field": "data"
      }
    }
  ]
}'

This pipeline extracts attachment information from the data field and removes the original base64-encoded data to save space.

Step 2: Indexing a Document with an Attachment

Prepare a sample document with a base64-encoded PDF file:

{
  "data": "JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PC9MaW5lYXJpemVkIDIgL0wgMTExMTENyCjIwL1UgMzY0MjMvTiAxL1RQIDEwMjcKPj4KZW5kb2JqCjw8L0VuY3J5cHR..."
}

Index this document using the attachment_pipeline:

curl -X PUT "localhost:9200/myindex/_doc/1?pipeline=attachment_pipeline" -H 'Content-Type: application/json' -d'
{
  "data": "JVBERi0xLjQKJaqrrK0KNCAwIG9iago8PC9MaW5lYXJpemVkIDIgL0wgMTExMTENyCjIwL1UgMzY0MjMvTiAxL1RQIDEwMjcKPj4KZW5kb2JqCjw8L0VuY3J5cHR..."
}'

Output:

The document is indexed, and the text content and metadata are extracted and indexed separately:

{
  "_index": "myindex",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}

Indexing Attachments and Binary Data with Elasticsearch Plugins

Elasticsearch is renowned for its powerful search capabilities, but its functionality extends beyond just text and structured data. Often, we need to index and search binary data such as PDFs, images, and other attachments. Elasticsearch supports this through plugins, making it easy to handle and index various binary formats.

This article will guide you through indexing attachments and binary data using Elasticsearch plugins, with detailed examples and outputs.

Setting Up the Ingest Pipeline

Step 1: Define the Ingest Pipeline

Step 2: Indexing a Document with an Attachment

Indexing Attachments and Binary Data with Elasticsearch Plugins

Similar Reads

Contact Us