Database Indexes: Why the Right Structure Changes Query Performance
A database index is a data structure that makes lookups faster at the cost of additional storage and slower write performance. Getting indexing right is one of the highest-leverage performance improvements you can make for applications that read data frequently. The wrong indexes slow down writes and waste storage without helping reads. The right indexes turn queries that take seconds into queries that return instantly.
This article covers how to design a database indexing strategy based on actual query patterns, how to use EXPLAIN to validate index usage, and the common mistakes that cause indexing to fail. It focuses on practical approaches that apply to MySQL and PostgreSQL environments commonly used in business applications.
How Database Indexes Work
Most relational databases use B-tree indexes by default: a balanced tree structure where each node contains an ordered set of keys and pointers to the actual data. When you search for a value in a B-tree, the database traverses from the root node down through branches to the leaf nodes containing the data pointers. This search complexity is O(log n) rather than O(n) for a full table scan.
For a table with 10 million rows, this difference is significant. A query searching for a specific customer record might examine 10 million rows without an index, or it might examine fewer than 20 rows with the right index in place. The result can mean the difference between a query that takes 5 seconds and one that takes 5 milliseconds.
When you create an index on a column, the database builds the B-tree structure over the values in that column. When you query that column with a WHERE clause, the database uses the index to locate matching rows directly rather than scanning the entire table row by row.
Designing Indexes Based on Query Patterns
Design indexes based on the queries your application actually runs. Do not create indexes speculatively on every column. Instead, observe your access patterns and create indexes deliberately to support them. The three most important indexes to add first:
- Primary key index: Created automatically by the database. Every table should have a primary key defined. Use a UUID or bigint as the key type, not a natural key like an email address. Natural keys can change, and their variable length makes index operations less efficient.
- Foreign key indexes: Create an index on any column referenced by a foreign key in another table. Without this index, joining two large tables on an unindexed foreign key causes a full table scan on the referenced table. This is one of the most common causes of slow joins in production databases.
- WHERE clause indexes: For any column that appears frequently in WHERE clauses, create an index. The column order in composite indexes matters significantly, which the next section explains in detail.
-- Example: Composite index for filtering by status and date
-- Supports: WHERE status = ? AND created_at > ?
-- Also supports: WHERE status = ?
-- Does not support: WHERE created_at > ? alone
CREATE INDEX idx_bookings_status_created ON bookings(status, created_at DESC);
If your application involves complex search requirements beyond what standard B-tree indexes handle efficiently, it may be worth evaluating dedicated search solutions. A comparison of Elasticsearch versus MySQL full-text search covers when moving beyond standard database indexes makes sense for your use case.
Using EXPLAIN to Validate Index Usage
Always verify that your indexes are being used with EXPLAIN. Most relational databases support this command in some form. Running EXPLAIN on your slow queries shows the query execution plan and whether indexes are being used as expected.
EXPLAIN SELECT * FROM bookings
WHERE status = 'confirmed' AND created_at > '2025-01-01';
-- Key output fields to examine:
-- type: ref or range (index used) vs ALL (full table scan)
-- key: the index actually used by the query planner
-- rows: number of rows examined to produce the result
-- Extra: using filesort indicates a slow operation that may need optimization
When EXPLAIN shows a full table scan (type: ALL) on a query that should use an index, several issues could be responsible. The index may be missing entirely. The column may have a mismatched data type, such as comparing a string to a number in the query. The index may exist but remain unused because the query optimizer determines that scanning the table is faster for the specific data distribution. Checking data types and running ANALYZE to update table statistics often resolves optimizer decisions that seem counterintuitive.
Understanding EXPLAIN output is an iterative process. What the optimizer chooses depends on table statistics, data distribution, and estimated row counts. A practical slow query optimization workflow involves identifying slow queries, running EXPLAIN, adding or adjusting indexes, then running EXPLAIN again to confirm improvement.
Finding and Removing Unused Indexes
Indexes that nobody queries are pure overhead. Every index you add slows down INSERT, UPDATE, and DELETE operations because the database must update the index alongside the table data. Unused indexes also consume storage space unnecessarily.
MySQL and PostgreSQL both provide tools to identify unused indexes. MySQL's performance_schema tracks index usage statistics over time. PostgreSQL's pg_stat_user_indexes view provides similar information about index access patterns.
-- PostgreSQL: Find indexes that have never been used
SELECT
schemaname,
tablename,
indexname,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(indexrelid) DESC;
Query these sources periodically and review indexes with zero or very low scan counts. Before dropping an index, confirm that the application does not rely on it during edge cases, seasonal periods, or admin operations that run infrequently. Removing a genuinely needed index during a quiet period causes problems when that busy period arrives.
Composite Indexes and Column Order
A composite index covers multiple columns in a single index structure. Column order in composite indexes has a significant practical impact on query support. An index on (tenant_id, booking_date) can satisfy queries filtering by tenant_id alone, and queries filtering by both tenant_id and booking_date. However, it cannot satisfy queries filtering by booking_date alone, because the search would need to scan across all tenant_id values to find matching dates.
The general rule is to put the most selective column first, meaning the column that appears in WHERE clauses most frequently with the highest cardinality. High cardinality means the column has many distinct values relative to its row count. A column with millions of unique user IDs is more selective than a column with only five status values.
When queries filter by multiple columns in a known order, match that order in the composite index. If your application consistently runs queries with WHERE customer_id = ? AND status = ?, the index should be (customer_id, status), not (status, customer_id).
Check for redundant indexes before adding new composite indexes. If you have an index on (tenant_id) and another on (tenant_id, booking_date), the single-column index is redundant for queries filtering by tenant_id alone, because the composite index covers it. MySQL and PostgreSQL will not use both indexes simultaneously; each query uses a single index. Dropping the redundant index removes write overhead without losing read capability.
Covering Indexes and Partial Indexes
A covering index includes all columns a query needs, allowing the database to satisfy the request entirely from the index without accessing the table data. This technique is particularly useful for frequently run queries that return a small number of columns.
-- PostgreSQL: Covering index with INCLUDE clause
CREATE INDEX idx_covering_booking
ON bookings(tenant_id, booking_date)
INCLUDE (customer_name, status, total_amount);
-- MySQL: Non-leaf index pages store all indexed columns
CREATE INDEX idx_covering_booking
ON bookings(tenant_id, booking_date, customer_name, status, total_amount);
When the query planner can retrieve all required columns from the index alone, it performs an index-only scan. This avoids the overhead of following index pointers back to the main table data. Covering indexes work best when table data is relatively static, because frequent updates cause the database to load table pages to verify row visibility, which undermines the performance benefit.
Partial indexes, available in PostgreSQL, index only the rows matching a specified condition. This approach creates a much smaller index that covers only the relevant subset of data.
-- Index only active bookings, reducing size significantly
CREATE INDEX idx_active_bookings ON bookings(booking_date)
WHERE status = 'active';
-- Index recent records only
CREATE INDEX idx_recent_orders ON orders(created_at)
WHERE created_at > '2024-01-01';
Partial indexes reduce storage requirements and index maintenance overhead. They are especially useful for tables with imbalanced data distributions, such as an orders table where 90% of rows have status "completed" but your application queries "pending" orders most frequently.
Indexing for JOINs and aggregations
WHERE clauses are not the only place indexes matter. JOIN operations benefit from indexes on join columns, particularly on the foreign key side of the relationship. If you frequently join orders to customers on customer_id, an index on orders.customer_id makes that join significantly faster.
-- Without index: full scan of orders for each customer
-- With index: direct lookup of matching orders
CREATE INDEX idx_orders_customer ON orders(customer_id);
GROUP BY clauses also benefit from indexes that match the grouping columns. An index on (status) helps a query with GROUP BY status because the database can read the index in order and count occurrences without a separate sorting step. For complex aggregations, composite indexes with columns in GROUP BY order can eliminate expensive sort operations.
When Standard Indexes Are Not Enough
B-tree indexes work well for equality conditions and range queries on sortable data. They handle WHERE email = '[email protected]' and WHERE created_at > '2025-01-01' efficiently. However, they struggle with more complex requirements.
Full-text search across large text columns requires scanning or loading entire field contents, which B-tree indexes cannot avoid. Geospatial queries involving coordinates or polygons need R-tree or GiST indexes instead. High-cardinality columns with highly skewed access patterns may benefit from index-only scans or covering indexes as described above.
When your query complexity outgrows what standard database indexes handle efficiently, a dedicated search engine or specialised database may be the appropriate solution. Evaluating options like Elasticsearch alongside your existing database can help you understand where the boundary lies for your specific workload.
Maintaining Indexes Over Time
Database schemas evolve. Queries change. Data distributions shift. An index that was optimal six months ago may no longer serve current query patterns. Periodic review of index effectiveness keeps performance expectations realistic.
Monitor query performance over time rather than only at setup. A query that ran quickly when the table had 100,000 rows may slow noticeably at 5 million rows. Recurring performance reviews catch these shifts before they become user-facing problems. This maintenance approach applies whether you run MySQL, PostgreSQL, or another relational database.
If you are evaluating which database platform fits your application needs, understanding the indexing differences between options matters. A comparison of PostgreSQL versus MySQL covers platform-specific indexing features and performance characteristics relevant to business applications.