Database Indexing: Complete Developer Guide
Database indexing is the practice of creating auxiliary data structures that speed up data retrieval at the cost of slower writes — using index types like B-tree, Hash, GiST, and GIN to match different query patterns.
What You’ll Learn
You’ll understand B-tree and hash index internals, composite and covering index design, index selectivity, when indexes hurt write performance, partial indexes, and how to monitor index usage in production.
Why Indexing Matters
Without indexes, every query becomes a full table scan — reading millions of rows to find a handful of matches. DodaZIP indexes file metadata from thousands of compressed archives; without proper database indexing, searching for a specific file would take minutes instead of milliseconds.
Database Indexing Learning Path
flowchart LR
A[SQL Basics] --> B[SQL Query Optimization]
B --> C[Database Indexing]
C --> D[Performance Monitoring]
C:::current
classDef current fill:#f90,color:#fff,stroke:#333,stroke-width:2px
EXPLAIN plans is helpful.What Is a Database Index?
Think of an index like the index at the back of a textbook. Without it, you’d flip through every page to find “B-tree” (full table scan). With the index, you go straight to the right page numbers (index lookup).
-- Creating an index speeds up queries on the indexed column
CREATE INDEX idx_users_email ON users (email);
-- Now this query uses the index instead of scanning the full table
SELECT * FROM users WHERE email = 'alice@example.com';B-Tree Indexes
The B-tree (balanced tree) is the default and most common index type. It works well for equality and range queries, sorting, and pattern matching with leading wildcards.
How B-Tree Works
flowchart TD
Root["Root Node<br/>50, 100"] --> L1["Leaf: 10, 20, 30, 40"]
Root --> L2["Leaf: 50, 60, 70, 80, 90"]
Root --> L3["Leaf: 100, 110, 120"]
L1 --> Data1[Row pointers → Table]
L2 --> Data2[Row pointers → Table]
L3 --> Data3[Row pointers → Table]
B-Tree Operations
-- B-tree supports these operations efficiently:
SELECT * FROM users WHERE email = 'alice@example.com'; -- Equality
SELECT * FROM users WHERE age > 25; -- Range
SELECT * FROM users ORDER BY created_at; -- Sorting
SELECT * FROM users WHERE name LIKE 'Ali%'; -- Prefix match (but NOT '%li%')Hash Indexes
Hash indexes use a hash function to map keys to bucket locations. They’re extremely fast for equality lookups but useless for range queries or sorting.
-- PostgreSQL: Hash index
CREATE INDEX idx_orders_id_hash ON orders USING HASH (order_id);
-- Fast for: exact match
SELECT * FROM orders WHERE order_id = 12345;
-- NOT supported: range, sort, or partial matches
SELECT * FROM orders WHERE order_id > 10000; -- Won't use hash indexHash indexes are smaller than B-tree indexes and faster for point lookups, but they cannot support order-based operations.
GiST and GIN Indexes
These are specialized index types for non-standard data.
GiST (Generalized Search Tree)
Best for geometric data, full-text search ranking, and range types.
-- GiST index on a geometry column (PostgreSQL + PostGIS)
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);
-- Efficient spatial query
SELECT * FROM locations
WHERE coordinates <@ box '(0,0,100,100)'; -- Points within a bounding boxGIN (Generalized Inverted Index)
Best for array columns, JSONB, and full-text search (when you need speed over ranking).
-- GIN index on a JSONB column
CREATE INDEX idx_products_attrs ON products USING GIN (attributes);
-- Fast JSONB containment query
SELECT * FROM products
WHERE attributes @> '{"color": "red", "size": "M"}';Composite Indexes
A composite index (multi-column index) indexes multiple columns together. Column order matters significantly.
-- Composite index on (country, status, created_at)
CREATE INDEX idx_orders_lookup ON orders (country, status, created_at);The Leftmost Prefix Rule
The index above can accelerate:
-- Uses index: country is the leftmost column
SELECT * FROM orders WHERE country = 'US';
-- Uses index: country + status (leftmost prefix)
SELECT * FROM orders WHERE country = 'US' AND status = 'shipped';
-- Uses index: country + status + created_at (full index)
SELECT * FROM orders WHERE country = 'US' AND status = 'shipped' AND created_at > '2025-01-01';
-- Cannot use index: skips the leftmost column
SELECT * FROM orders WHERE status = 'shipped'; -- Full scanColumn ordering rule: Put high-selectivity (unique) columns first, then equality columns, then range columns.
Index Selectivity
Selectivity measures how many rows a given value matches. High selectivity = few rows per value (like email). Low selectivity = many rows per value (like boolean is_active).
-- High selectivity: Each email is unique → excellent index candidate
CREATE INDEX idx_users_email ON users (email);
-- Low selectivity: Only two values (true/false) → poor single index
CREATE INDEX idx_users_active ON users (is_active); -- Rarely useful aloneRule of thumb: Only index columns with selectivity above 20% (each value matches less than 20% of rows). Combining low-selectivity columns in a composite index can still be effective.
Covering Indexes
A covering index contains all columns needed by a query, allowing index-only scans — the database never reads the table at all.
-- Query needs: id, status, total
SELECT id, status, total FROM orders WHERE status = 'pending';
-- Regular index: find matching rows in index, then fetch 'total' from table
CREATE INDEX idx_orders_status ON orders (status);
-- Covering index: includes 'total' so no table access needed
CREATE INDEX idx_orders_status_covering ON orders (status) INCLUDE (id, total);Expected performance: Index-only scans can be 2-10x faster than regular index scans because they avoid disk I/O to the main table.
When Indexes Hurt (Write Overhead)
Every index adds overhead on INSERT, UPDATE, and DELETE because the database must maintain the index structure alongside the table data.
-- A table with 10 indexes means each INSERT updates 11 structures (table + 10 indexes)Index overhead by the numbers:
| Indexes | Read Speed | Write Speed | Storage |
|---|---|---|---|
| 0 | Slow | Fast | Minimal |
| 3-5 | Fast | OK | Moderate |
| 10+ | Very Fast | Slow | High |
| 20+ | Marginal gain | Very Slow | Very High |
Rule of thumb: No more than 5-7 indexes per table in OLTP systems. For analytical (OLAP) workloads, more indexes are acceptable.
Partial Indexes
A partial index indexes only a subset of rows meeting a condition. It’s smaller and faster than a full index.
-- Full index: indexes all rows
CREATE INDEX idx_orders_all ON orders (status);
-- Partial index: only indexes active/important rows
CREATE INDEX idx_orders_active ON orders (status) WHERE status IN ('pending', 'processing');
-- PostgreSQL partial unique index
CREATE UNIQUE INDEX idx_active_subscriptions ON subscriptions (user_id)
WHERE status = 'active'; -- Each user can have only one active subscriptionPartial indexes are a PostgreSQL strength, but SQLite and SQL Server also support them. MySQL does not support partial indexes directly.
Index Maintenance
Indexes degrade over time due to page splits and fragmentation.
-- PostgreSQL: Rebuild an index
REINDEX INDEX idx_users_email;
-- MySQL: Reorganize a table (rebuilds indexes)
OPTIMIZE TABLE orders;
-- Recommended frequency: monthly for busy tables, quarterly for static onesMonitoring Index Usage
-- PostgreSQL: Find unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY tablename;
-- Remove unused indexes to speed up writes
DROP INDEX IF EXISTS unused_index_name;Unused indexes are pure overhead. Run this query monthly and drop indexes that haven’t been used in weeks.
Common Database Indexing Errors
1. Indexing Every Column “Just in Case”
Each unnecessary index slows writes and wastes storage. Only index columns that appear in WHERE, JOIN, or ORDER BY clauses of frequent queries.
2. Wrong Column Order in Composite Indexes
Putting a low-selectivity column first defeats the index’s purpose. CREATE INDEX ON orders (status, country) — if status has only 3 values, the first column prunes nothing useful.
3. Indexing Low-Cardinality Columns Alone
A standalone index on gender (2 values) or is_active (2 values) is almost never useful. Combine them with other columns in a composite index.
4. Ignoring Index Maintenance
Indexes fragment over time. A fragmented B-tree can be 30-50% slower. Schedule regular REINDEX or OPTIMIZE TABLE runs.
5. Using Hash Indexes for Range Queries
Hash indexes only support equality (=). If you also need >, <, BETWEEN, or ORDER BY, use a B-tree index.
6. Forgetting Foreign Key Indexes
Foreign key columns need indexes for efficient JOINs. MySQL doesn’t auto-index them; PostgreSQL does (but may need composite indexes for multi-column FKs).
7. Creating Duplicate Indexes
-- These are redundant (leftmost prefix rule)
CREATE INDEX idx_a_b ON t (a, b);
CREATE INDEX idx_a ON t (a); -- Redundant: already covered by idx_a_bPractice Questions
1. Why does column order matter in a composite index?
The index can only be used when queries reference columns from the leftmost prefix. A query filtering only on the second column can’t use the index.
2. When would you choose a GIN index over a B-tree?
When indexing array columns, JSONB data, or full-text search vectors. B-tree can’t efficiently index these types.
3. What’s a partial index and when should you use one?
An index that only covers rows matching a WHERE condition. Use it when you frequently query a specific subset — like “active subscriptions” or “pending orders.”
4. How do you find unused indexes in PostgreSQL?
Query pg_stat_user_indexes and look for rows where idx_scan = 0.
5. Challenge: Design indexes for an e-commerce database.
Given this query pattern:
SELECT id, name, price, stock
FROM products
WHERE category_id = 5 AND price BETWEEN 10 AND 100 AND stock > 0
ORDER BY price ASC
LIMIT 20;Design the optimal indexes and explain your reasoning. Answer: Create a covering composite index on products(category_id, price) INCLUDE (name, stock) — this covers the WHERE filter, ORDER BY, and all selected columns in one index-only scan.
FAQ
Try It Yourself
Run this query against your database to identify index candidates:
-- Find queries that do sequential scans on large tables (PostgreSQL)
SELECT relname, seq_scan, seq_tup_read, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > 100 AND seq_tup_read > 100000
ORDER BY seq_tup_read DESC;For each result, analyze the queries accessing that table and add appropriate indexes.
What’s Next
You’ve completed the Database Indexing tutorial. Start by auditing your current indexes, removing unused ones, and adding targeted composite indexes for your slowest queries. Every index you drop speeds up your writes, and every well-placed one speeds up your reads.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro