Learn Databases: Elasticsearch Guide — Search Engine and Analytics Database

Elasticsearch Guide — Search Engine and Analytics Database

DodaTech Updated Jun 7, 2026 10 min read

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene, designed for full-text search, structured querying, and real-time data analysis at massive scale.

What You’ll Learn

By the end of this tutorial, you’ll understand Elasticsearch’s inverted index architecture, write queries using match, term, and bool clauses, create mappings and aggregations, build Kibana dashboards, and manage a cluster for production use.

Why Elasticsearch Matters

Elasticsearch powers search for companies like Netflix, eBay, and Wikipedia. Doda Browser uses Elasticsearch for its instant search suggestions, while Durga Antivirus Pro leverages it for real-time threat log analysis across millions of endpoints. Learning Elasticsearch gives you a skill essential for log analytics, e-commerce search, and observability platforms.

Elasticsearch Learning Path

    flowchart LR
  A[SQL Basics] --> B[PostgreSQL]
  B --> C[Elasticsearch]
  C --> D[MongoDB]
  D --> E[Redis]
  E --> F[Database Design]
  C --> G{You Are Here}
  style G fill:#f90,color:#fff

Prerequisites: Basic understanding of SQL and databases. Familiarity with REST API concepts (HTTP methods, JSON) is helpful since Elasticsearch uses REST for all operations.

What Is Elasticsearch? (The “Why” First)

Think of Elasticsearch as Google for your data. While SQL databases are great at exact lookups and transactions, they’re terrible at full-text search — try searching “quick brown fox” with a typo in MySQL and you’ll see. Elasticsearch builds an inverted index: for every word, it stores which documents contain that word. This makes text searches blazingly fast, fuzzy-tolerant, and ranked by relevance.

Elasticsearch vs Traditional Databases

Feature	SQL Database	Elasticsearch
Query	SQL	JSON-based REST queries
Search	`LIKE '%term%'` (slow)	Full-text scoring (fast)
Schema	Rigid (ALTER TABLE)	Dynamic mapping
Joins	JOINs (normalized)	Denormalized (nested objects)
Analytics	GROUP BY	Aggregations (buckets, metrics)

Elasticsearch Architecture

    flowchart TB
    subgraph Cluster
        N1[Node 1 - Master]
        N2[Node 2 - Data]
        N3[Node 3 - Data]
        N4[Node 4 - Ingest]
    end
    subgraph Indices
        I1[Index: products]
        I2[Index: logs]
    end
    subgraph Shards
        I1 --> S1[Primary Shard 1]
        I1 --> S2[Primary Shard 2]
        I1 --> R1[Replica Shard 1]
        I1 --> R2[Replica Shard 2]
    end
    Client[REST Client] --> N1
    N1 --> N2
    N1 --> N3
    N1 --> N4
    N2 --> I1
    N3 --> I2

An index in Elasticsearch is like a database table. Each index is split into shards (partitioned across nodes) with replicas (copies for redundancy). This distributed architecture allows Elasticsearch to scale horizontally.

Creating an Index and Mapping

Elasticsearch uses REST API for all operations:

# Create an index with explicit mapping
PUT /products
{
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "standard"
      },
      "description": {
        "type": "text"
      },
      "price": {
        "type": "float"
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "in_stock": {
        "type": "boolean"
      },
      "created_at": {
        "type": "date"
      }
    }
  }
}

Field Types Explained:

text — full-text searchable (analyzed: broken into tokens, lowercased)
keyword — exact match, filtering, aggregations (not analyzed)
float, integer — numeric values for range queries
boolean — true/false
date — date/time values with format support

Indexing Documents

# Index a single document
POST /products/_doc/1
{
  "name": "Wireless Bluetooth Headphones",
  "description": "Noise-canceling over-ear headphones with 30-hour battery life and deep bass response.",
  "price": 149.99,
  "category": "Electronics",
  "tags": ["audio", "wireless", "headphones"],
  "in_stock": true,
  "created_at": "2026-06-07"
}

# Index multiple documents using bulk API
POST /_bulk
{ "index": { "_index": "products", "_id": "2" } }
{ "name": "USB-C Hub 7-in-1", "description": "Multi-port adapter with HDMI, USB-A, SD card reader for laptops.", "price": 45.99, "category": "Accessories", "tags": ["usb", "hub", "adapter"], "in_stock": true, "created_at": "2026-06-07" }
{ "index": { "_index": "products", "_id": "3" } }
{ "name": "Mechanical Keyboard", "description": "RGB backlit mechanical keyboard with Cherry MX Blue switches for typing.", "price": 89.99, "category": "Electronics", "tags": ["keyboard", "mechanical", "typing"], "in_stock": false, "created_at": "2026-06-07" }

Querying with the Search API

Match Query — Full-Text Search

GET /products/_search
{
  "query": {
    "match": {
      "description": "noise canceling headphones"
    }
  }
}

Output (abbreviated):

{
  "hits": {
    "total": { "value": 1 },
    "max_score": 1.2,
    "hits": [
      {
        "_id": "1",
        "_score": 1.2,
        "_source": {
          "name": "Wireless Bluetooth Headphones",
          "description": "Noise-canceling over-ear headphones with 30-hour battery life and deep bass response.",
          "price": 149.99
        }
      }
    ]
  }
}

The match query analyzes the input text and finds documents with similar terms. Results are ranked by _score (relevance).

Term Query — Exact Match

GET /products/_search
{
  "query": {
    "term": {
      "category": "Electronics"
    }
  }
}

Output (abbreviated):

{
  "hits": {
    "total": { "value": 2 },
    "hits": [
      { "_id": "1", "_source": { "name": "Wireless Bluetooth Headphones" } },
      { "_id": "3", "_source": { "name": "Mechanical Keyboard" } }
    ]
  }
}

The term query is for exact matches on keyword fields. It does NOT analyze the input — it looks for the exact value.

Bool Query — Combining Conditions

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "description": "keyboard" } }
      ],
      "filter": [
        { "term": { "category": "Electronics" } },
        { "range": { "price": { "lte": 100 } } }
      ],
      "must_not": [
        { "term": { "tags": "wireless" } }
      ]
    }
  }
}

Bool Query Clauses:

must — required, contributes to score
filter — required, no scoring (faster, cacheable)
should — optional, increases score
must_not — excluded

Aggregations — Analytics on Your Data

Aggregations are Elasticsearch’s equivalent of SQL’s GROUP BY — but much more powerful:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "by_category": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_price": {
          "avg": { "field": "price" }
        },
        "in_stock_count": {
          "filter": { "term": { "in_stock": true } }
        }
      }
    }
  }
}

Output:

{
  "aggregations": {
    "by_category": {
      "buckets": [
        {
          "key": "Electronics",
          "doc_count": 2,
          "avg_price": { "value": 119.99 },
          "in_stock_count": { "doc_count": 1 }
        },
        {
          "key": "Accessories",
          "doc_count": 1,
          "avg_price": { "value": 45.99 },
          "in_stock_count": { "doc_count": 1 }
        }
      ]
    }
  }
}

You can nest aggregations arbitrarily — buckets within buckets, with metrics at each level. This powers the drill-down analytics dashboards used by Durga Antivirus Pro for threat classification.

Kibana — Visualizing Elasticsearch Data

Kibana is the visualization layer that sits on top of Elasticsearch:

    flowchart LR
    A[Elasticsearch] --> B[Kibana]
    B --> C[Dashboard]
    B --> D[Discover]
    B --> E[Canvas]
    B --> F[Maps]
    C --> G[Bar Charts]
    C --> H[Line Graphs]
    C --> I[Pie Charts]
    C --> J[Data Tables]

Key Kibana Features:

Discover — search and filter raw data interactively
Visualize — create charts, maps, and graphs from aggregations
Dashboard — combine visualizations into real-time monitoring panels
Canvas — custom pixel-perfect presentations
Machine Learning — anomaly detection on time-series data
Alerting — notify when queries match conditions

Common Elasticsearch Errors

1. `index_not_found_exception`

GET /nonexistent_index/_search
# Returns 404: index_not_found_exception

Fix: Verify index names with GET /_cat/indices?v. Create the index first, or use ignore_unavailable=true in multi-index queries.

2. `mapper_parsing_exception`

# Trying to index a number into a text field
POST /products/_doc/4
{ "price": "not_a_number" }
# Returns: mapper_parsing_exception

Fix: Ensure field types match the mapping. Check the mapping with GET /products/_mapping and correct your data.

3. `circuit_breaking_exception`

Elasticsearch has memory circuit breakers to prevent OutOfMemoryErrors. If a query tries to use too much memory, it fails. Fix: Reduce the query scope, optimize aggregations, or increase indices.breaker.request.limit.

4. `cluster_block_exception` due to disk space

When disk usage exceeds 95%, Elasticsearch blocks writes to prevent data loss. Fix: Free disk space, add nodes, or increase watermark thresholds (temporarily).

5. Search Returns Zero Results for Text Field

# term query on a text field (wrong!)
GET /products/_search
{ "query": { "term": { "description": "Noise-canceling" } } }
# Returns 0 hits!

Fix: Text fields are analyzed — the value “Noise-canceling” gets tokenized to [“noise”, “canceling”]. Use match query for text fields instead of term. Use term only on keyword fields.

6. `too_many_clauses` in Bool Query

A bool query with too many should clauses (default max: 1024) fails. Fix: Increase indices.query.bool.max_clause_count or restructure your query.

7. Shard Allocation Issues

When a node goes down, shards may remain unassigned. Fix: Check GET _cat/shards?v for UNASSIGNED shards. Trigger reroute with POST /_cluster/reroute.

Practice Questions

1. What is an inverted index?

An inverted index maps each unique word to the list of documents containing it, enabling fast full-text search. Instead of scanning documents for words, Elasticsearch looks up words in the index to find matching documents instantly.

2. What’s the difference between match and term queries?

match analyzes the input (tokenizes, lowercases) and searches text fields with scoring. term searches for exact values in keyword fields without analysis. Use match for full-text search, term for IDs, status values, or exact categories.

3. What are shards in Elasticsearch?

Shards are horizontal partitions of an index. Each shard is a complete Lucene index. Sharding allows Elasticsearch to distribute data across multiple nodes for scalability and parallel processing.

4. Challenge: Write a query that finds products with “headphones” in the name, priced between $50 and $200, and returns the average price per category.

GET /products/_search
{
  "query": {
    "bool": {
      "must": [{ "match": { "name": "headphones" } }],
      "filter": [{ "range": { "price": { "gte": 50, "lte": 200 } } }]
    }
  },
  "aggs": {
    "by_category": {
      "terms": { "field": "category" },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } }
      }
    }
  }
}

5. What is the purpose of the _score field?

_score represents the relevance of a document to the query. Higher scores mean better matches. Elasticsearch calculates scores using TF-IDF or BM25 algorithms based on term frequency, inverse document frequency, and field length.

Real-World Task: Build a Log Analytics Pipeline

Set up an Elasticsearch cluster to ingest and analyze application logs — the same pattern Durga Antivirus Pro uses for security event monitoring:

# Create a logs index with daily indexing pattern
PUT /logs-2026.06.07
{
  "mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "level": { "type": "keyword" },
      "service": { "type": "keyword" },
      "message": { "type": "text" },
      "user_id": { "type": "keyword" },
      "response_time_ms": { "type": "integer" }
    }
  }
}

# Ingest a log entry
POST /logs-2026.06.07/_doc
{
  "timestamp": "2026-06-07T14:30:00Z",
  "level": "ERROR",
  "service": "auth-service",
  "message": "Failed login attempt for user jdoe from IP 192.168.1.100",
  "user_id": "jdoe",
  "response_time_ms": 2450
}

# Find the most common error types in the last hour
GET /logs-2026.06.07/_search
{
  "size": 0,
  "query": {
    "range": { "timestamp": { "gte": "now-1h" } }
  },
  "aggs": {
    "error_levels": {
      "terms": { "field": "level" },
      "aggs": {
        "top_services": {
          "terms": { "field": "service" },
          "aggs": {
            "avg_response_time": { "avg": { "field": "response_time_ms" } }
          }
        }
      }
    }
  }
}

FAQ

What is the difference between Elasticsearch and a traditional SQL database?

Elasticsearch is optimized for full-text search and analytics, not transactions. SQL databases support ACID transactions and complex joins. Use Elasticsearch for search and logs; use SQL databases for transactional data. Many architectures use both together.

Is Elasticsearch free to use?

Yes, the basic features are free under the Elastic License or Apache 2.0 (for open-source distributions like OpenSearch). Paid features include security, alerting, and machine learning via Elastic Cloud subscriptions.

What is the Elastic Stack?

The Elastic Stack (formerly ELK) consists of Elasticsearch (search/analytics), Logstash (data processing pipeline), Kibana (visualization), and Beats (lightweight data shippers). Together they form a complete observability platform.

How does Elasticsearch handle indexing and search performance?

Elasticsearch uses inverted indexes, near-real-time indexing (data is searchable within 1 second), and distributed sharding. For read-heavy workloads, add replica shards. For write-heavy workloads, add primary shards or use bulk indexing.

What is a mapping in Elasticsearch?

A mapping defines how documents and their fields are stored and indexed. It includes field types (text, keyword, integer, etc.), analyzers, and indexing options. Dynamic mapping auto-detects types; explicit mapping gives you control.

Try It Yourself

Start a local Elasticsearch instance with Docker and test these cluster management APIs:

# Check cluster health
GET /_cluster/health

# Output:
# {"cluster_name":"docker-cluster","status":"yellow","number_of_nodes":1,
#  "number_of_data_nodes":1,"active_primary_shards":10,
#  "active_shards":10,"relocating_shards":0,"initializing_shards":0,
#  "unassigned_shards":5}

# List all indices
GET /_cat/indices?v

# Monitor node stats
GET /_nodes/stats

# View running tasks
GET /_tasks

These monitoring patterns are built into Doda Browser’s search backend and Durga Antivirus Pro’s real-time threat detection pipeline for scanning millions of events per second.

What’s Next

SQLite Guide

SQL Server Guide

MongoDB Explained

Congratulations on completing this Elasticsearch tutorial! Here’s where to go from here:

Practice daily — Consistency is more important than long study sessions
Build a project — Apply what you learned by building something real
Explore related topics — Check out other tutorials in the same category
Join the community — Discuss with other learners and share your progress

Remember: every expert was once a beginner. Keep coding!

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Previous Microsoft SQL Server Guide — T-SQL, SSMS, and Administration Next SQLite Guide — Embedded Database for Applications

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Databases