Skip to content
Database Indexing: Complete Developer Guide

Database Indexing: Complete Developer Guide

DodaTech Updated Jun 20, 2026 9 min read

Database indexing is the practice of creating auxiliary data structures that speed up data retrieval at the cost of slower writes — using index types like B-tree, Hash, GiST, and GIN to match different query patterns.

What You’ll Learn

You’ll understand B-tree and hash index internals, composite and covering index design, index selectivity, when indexes hurt write performance, partial indexes, and how to monitor index usage in production.

Why Indexing Matters

Without indexes, every query becomes a full table scan — reading millions of rows to find a handful of matches. DodaZIP indexes file metadata from thousands of compressed archives; without proper database indexing, searching for a specific file would take minutes instead of milliseconds.

Database Indexing Learning Path

    flowchart LR
  A[SQL Basics] --> B[SQL Query Optimization]
  B --> C[Database Indexing]
  C --> D[Performance Monitoring]
  C:::current
  classDef current fill:#f90,color:#fff,stroke:#333,stroke-width:2px
  
Prerequisites: Understanding of basic SQL queries and MySQL or PostgreSQL syntax. Familiarity with EXPLAIN plans is helpful.

What Is a Database Index?

Think of an index like the index at the back of a textbook. Without it, you’d flip through every page to find “B-tree” (full table scan). With the index, you go straight to the right page numbers (index lookup).

-- Creating an index speeds up queries on the indexed column
CREATE INDEX idx_users_email ON users (email);

-- Now this query uses the index instead of scanning the full table
SELECT * FROM users WHERE email = 'alice@example.com';

B-Tree Indexes

The B-tree (balanced tree) is the default and most common index type. It works well for equality and range queries, sorting, and pattern matching with leading wildcards.

How B-Tree Works

    flowchart TD
    Root["Root Node<br/>50, 100"] --> L1["Leaf: 10, 20, 30, 40"]
    Root --> L2["Leaf: 50, 60, 70, 80, 90"]
    Root --> L3["Leaf: 100, 110, 120"]
    L1 --> Data1[Row pointers → Table]
    L2 --> Data2[Row pointers → Table]
    L3 --> Data3[Row pointers → Table]
  

B-Tree Operations

-- B-tree supports these operations efficiently:
SELECT * FROM users WHERE email = 'alice@example.com';  -- Equality
SELECT * FROM users WHERE age > 25;                      -- Range
SELECT * FROM users ORDER BY created_at;                 -- Sorting
SELECT * FROM users WHERE name LIKE 'Ali%';              -- Prefix match (but NOT '%li%')

Hash Indexes

Hash indexes use a hash function to map keys to bucket locations. They’re extremely fast for equality lookups but useless for range queries or sorting.

-- PostgreSQL: Hash index
CREATE INDEX idx_orders_id_hash ON orders USING HASH (order_id);

-- Fast for: exact match
SELECT * FROM orders WHERE order_id = 12345;

-- NOT supported: range, sort, or partial matches
SELECT * FROM orders WHERE order_id > 10000;  -- Won't use hash index

Hash indexes are smaller than B-tree indexes and faster for point lookups, but they cannot support order-based operations.

GiST and GIN Indexes

These are specialized index types for non-standard data.

GiST (Generalized Search Tree)

Best for geometric data, full-text search ranking, and range types.

-- GiST index on a geometry column (PostgreSQL + PostGIS)
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);

-- Efficient spatial query
SELECT * FROM locations
WHERE coordinates <@ box '(0,0,100,100)';  -- Points within a bounding box

GIN (Generalized Inverted Index)

Best for array columns, JSONB, and full-text search (when you need speed over ranking).

-- GIN index on a JSONB column
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);

-- Fast JSONB containment query
SELECT * FROM products
WHERE attributes @> '{"color": "red", "size": "M"}';

Composite Indexes

A composite index (multi-column index) indexes multiple columns together. Column order matters significantly.

-- Composite index on (country, status, created_at)
CREATE INDEX idx_orders_lookup ON orders (country, status, created_at);

The Leftmost Prefix Rule

The index above can accelerate:

-- Uses index: country is the leftmost column
SELECT * FROM orders WHERE country = 'US';

-- Uses index: country + status (leftmost prefix)
SELECT * FROM orders WHERE country = 'US' AND status = 'shipped';

-- Uses index: country + status + created_at (full index)
SELECT * FROM orders WHERE country = 'US' AND status = 'shipped' AND created_at > '2025-01-01';

-- Cannot use index: skips the leftmost column
SELECT * FROM orders WHERE status = 'shipped';  -- Full scan

Column ordering rule: Put high-selectivity (unique) columns first, then equality columns, then range columns.

Index Selectivity

Selectivity measures how many rows a given value matches. High selectivity = few rows per value (like email). Low selectivity = many rows per value (like boolean is_active).

-- High selectivity: Each email is unique → excellent index candidate
CREATE INDEX idx_users_email ON users (email);

-- Low selectivity: Only two values (true/false) → poor single index
CREATE INDEX idx_users_active ON users (is_active);  -- Rarely useful alone

Rule of thumb: Only index columns with selectivity above 20% (each value matches less than 20% of rows). Combining low-selectivity columns in a composite index can still be effective.

Covering Indexes

A covering index contains all columns needed by a query, allowing index-only scans — the database never reads the table at all.

-- Query needs: id, status, total
SELECT id, status, total FROM orders WHERE status = 'pending';

-- Regular index: find matching rows in index, then fetch 'total' from table
CREATE INDEX idx_orders_status ON orders (status);

-- Covering index: includes 'total' so no table access needed
CREATE INDEX idx_orders_status_covering ON orders (status) INCLUDE (id, total);

Expected performance: Index-only scans can be 2-10x faster than regular index scans because they avoid disk I/O to the main table.

When Indexes Hurt (Write Overhead)

Every index adds overhead on INSERT, UPDATE, and DELETE because the database must maintain the index structure alongside the table data.

-- A table with 10 indexes means each INSERT updates 11 structures (table + 10 indexes)

Index overhead by the numbers:

IndexesRead SpeedWrite SpeedStorage
0SlowFastMinimal
3-5FastOKModerate
10+Very FastSlowHigh
20+Marginal gainVery SlowVery High

Rule of thumb: No more than 5-7 indexes per table in OLTP systems. For analytical (OLAP) workloads, more indexes are acceptable.

Partial Indexes

A partial index indexes only a subset of rows meeting a condition. It’s smaller and faster than a full index.

-- Full index: indexes all rows
CREATE INDEX idx_orders_all ON orders (status);

-- Partial index: only indexes active/important rows
CREATE INDEX idx_orders_active ON orders (status) WHERE status IN ('pending', 'processing');

-- PostgreSQL partial unique index
CREATE UNIQUE INDEX idx_active_subscriptions ON subscriptions (user_id)
WHERE status = 'active';  -- Each user can have only one active subscription

Partial indexes are a PostgreSQL strength, but SQLite and SQL Server also support them. MySQL does not support partial indexes directly.

Index Maintenance

Indexes degrade over time due to page splits and fragmentation.

-- PostgreSQL: Rebuild an index
REINDEX INDEX idx_users_email;

-- MySQL: Reorganize a table (rebuilds indexes)
OPTIMIZE TABLE orders;

-- Recommended frequency: monthly for busy tables, quarterly for static ones

Monitoring Index Usage

-- PostgreSQL: Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;

-- Remove unused indexes to speed up writes
DROP INDEX IF EXISTS unused_index_name;

Unused indexes are pure overhead. Run this query monthly and drop indexes that haven’t been used in weeks.

Common Database Indexing Errors

1. Indexing Every Column “Just in Case”

Each unnecessary index slows writes and wastes storage. Only index columns that appear in WHERE, JOIN, or ORDER BY clauses of frequent queries.

2. Wrong Column Order in Composite Indexes

Putting a low-selectivity column first defeats the index’s purpose. CREATE INDEX ON orders (status, country) — if status has only 3 values, the first column prunes nothing useful.

3. Indexing Low-Cardinality Columns Alone

A standalone index on gender (2 values) or is_active (2 values) is almost never useful. Combine them with other columns in a composite index.

4. Ignoring Index Maintenance

Indexes fragment over time. A fragmented B-tree can be 30-50% slower. Schedule regular REINDEX or OPTIMIZE TABLE runs.

5. Using Hash Indexes for Range Queries

Hash indexes only support equality (=). If you also need >, <, BETWEEN, or ORDER BY, use a B-tree index.

6. Forgetting Foreign Key Indexes

Foreign key columns need indexes for efficient JOINs. MySQL doesn’t auto-index them; PostgreSQL does (but may need composite indexes for multi-column FKs).

7. Creating Duplicate Indexes

-- These are redundant (leftmost prefix rule)
CREATE INDEX idx_a_b ON t (a, b);
CREATE INDEX idx_a ON t (a);  -- Redundant: already covered by idx_a_b

Practice Questions

1. Why does column order matter in a composite index?

The index can only be used when queries reference columns from the leftmost prefix. A query filtering only on the second column can’t use the index.

2. When would you choose a GIN index over a B-tree?

When indexing array columns, JSONB data, or full-text search vectors. B-tree can’t efficiently index these types.

3. What’s a partial index and when should you use one?

An index that only covers rows matching a WHERE condition. Use it when you frequently query a specific subset — like “active subscriptions” or “pending orders.”

4. How do you find unused indexes in PostgreSQL?

Query pg_stat_user_indexes and look for rows where idx_scan = 0.

5. Challenge: Design indexes for an e-commerce database.

Given this query pattern:

SELECT id, name, price, stock
FROM products
WHERE category_id = 5 AND price BETWEEN 10 AND 100 AND stock > 0
ORDER BY price ASC
LIMIT 20;

Design the optimal indexes and explain your reasoning. Answer: Create a covering composite index on products(category_id, price) INCLUDE (name, stock) — this covers the WHERE filter, ORDER BY, and all selected columns in one index-only scan.

FAQ

What’s the difference between clustered and non-clustered indexes?
A clustered index determines the physical order of rows in the table (one per table). A non-clustered index is a separate structure with pointers to rows. MySQL InnoDB always uses the primary key as the clustered index.
How many indexes is too many?
For OLTP systems, 5-7 per table is a reasonable limit. Beyond that, write performance degrades noticeably. Analytical systems can handle more since writes are less frequent.
Do indexes work on views?
Materialized views can have indexes. Regular (virtual) views cannot be indexed directly — index the underlying tables instead.
Should I index boolean columns?
Almost never. A boolean column splits data into two groups (true/false), which is too coarse for efficient index usage. Combine it with other columns in a composite index if needed.

Try It Yourself

Run this query against your database to identify index candidates:

-- Find queries that do sequential scans on large tables (PostgreSQL)
SELECT relname, seq_scan, seq_tup_read, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > 100 AND seq_tup_read > 100000
ORDER BY seq_tup_read DESC;

For each result, analyze the queries accessing that table and add appropriate indexes.

What’s Next

You’ve completed the Database Indexing tutorial. Start by auditing your current indexes, removing unused ones, and adding targeted composite indexes for your slowest queries. Every index you drop speeds up your writes, and every well-placed one speeds up your reads.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro