Indexes in SQL database can be your best friend or your worst enemy. Set them up correctly, and your data will be queryable with maximum efficiency; set them poorly, and both queries and writes will be increasingly slow and deadlock prone.
Here are three things you should know about how to design SQL indexes:
1. Queries use one index per joined table.
When your SQL engine is planning the execution of a query, it can only lean on one index per joined table per query. You may have several indexes on a given table, but in any given query, the SQL engine will only use one of those indexes.
Additional indexes may not ever be used, meaning you’ve added the overhead of maintaining additional indexes without any query speed benefit. Extraneous indexes might result in the query planner selecting a less efficient index versus a more efficient index.
Make sure when you build an index against a table, it’s because you have real queries that you want to execute that can benefit from them. More indexes aren’t necessarily better.
2. Cardinality is important.
The cardinality of an index is the number of unique values within it. Your database table may have a billion rows in it, but if it only has 8 unique values among those rows, your cardinality is very low.
A low cardinality index is not a major efficiency gain. Most SQL indexes are binary search trees (B-Trees). Versus a serial scan of every row in a table to find matching constraints, a B-Tree logarithmically reduces the number of comparisons that have to be made. The gains from executing a search against a B-Tree are very low when the size of the tree is small.
So putting an index on a Boolean field? Or an enumerated value field? A cardinality of a very small number of distinct values among a very large number of rows will not yield noticeable efficiency gains. Save your database indexes for fields with very high cardinality to ensure the gains from scanning a B-Tree are largest versus sequential scans.
3. Multi-column indexes are flexible.
Creating an index on multiple columns allows you to use multiple values in your constraints in lookups against a single index, and if you’re regularly constraining your queries against multiple columns, multi-column indexes are an indispensable workhorse.
Additionally, multi-column indexes can still be used by the query planner even when all of the columns in the index aren’t constrained upon. For example, if you define a multi-column index against a user id, a start time, and an end time (in that order), that index can be used for queries against the user id alone, as well as the user id & start time together. The order of columns in the index definition matter: so long as your subset of columns follows the index definition left-to-right, that multi-column index is effectively and efficiently used. So, using the same example, even if your database engine supports using the multi-column index for queries against start time and stop time together (some do – some don’t), it’s not nearly as efficient as it would be if start time and stop time were the first two columns in the index definition.
This also means that if you have a multi-column index across several columns, having a single-column index against the first column in the multi-column index is redundant and superfluous – the multi-column index can be used just as easily in queries only constraining that one, left-most column.
By incorporating these realities of SQL indexes into your database design during Agile development, you can better ensure that the indexes you build will give you the greatest gains with the minimum overhead.