Introduction to MongoDB Indexing

Indexes make queries more efficient. Using indexes, queries can run without having to perform full collection scans to find matching documents. In this article, we explore MongoDb indexing, including examples, performance, and best practices.

What is an index?

An index is a data structure that stores a subset of a collection's documents in an easily traversable way. Specifically, an index stores the values of a specific field in a sorted order. This makes it possible to quickly check for equality or perform range-based queries without performing full collection scans.

MongoDb automatically creates a unique index on the _id field for every document. This provides a default way of differentiating records. You can create additional indexes on fields which are frequently queried to improve performance.

Example

Take the following sample collection:

{ _id: ObjectId(), a: 1, b: "ab" }
{ _id: ObjectId(), a: 6, b: "no" }
{ _id: ObjectId(), a: 2, b: "cd" }
{ _id: ObjectId(), a: 4, b: "jk" }
{ _id: ObjectId(), a: 3, b: "ef" }
{ _id: ObjectId(), a: 5, b: "lm" }

Let's say we run something like:

db.collection.find({a:3})

Without an index, MongoDb must perform a full collection scan. The query must scan every document in the collection to find matching objects.

While a full collection scan may not seem like a big deal with only 6 records, you can imagine how expensive these lookups get when collections have millions of documents.

To prevent full collection scans, we can create an index on the a field:

db.collection.createIndex({a:1})

This creates an ascending index on the a field. The index stores all of the a fields in ascending order:

{ _id: ObjectId(), a: 1, b: "ab" }
{ _id: ObjectId(), a: 2, b: "cd" }
{ _id: ObjectId(), a: 3, b: "ef" }
{ _id: ObjectId(), a: 4, b: "jk" }
{ _id: ObjectId(), a: 5, b: "lm" }
{ _id: ObjectId(), a: 6, b: "no" }

Now if we run something like db.collection.find({a:3}), the query can use the index instead of performing a full collection scan. Since the a field is sorted, the query can more quickly find the information it needs.

Index Types

In our example, we created a single field index. While single field indexes are more frequently used, other types of indexes exist to work with more complex data structures. Below is a brief description of the different types of indexes supported in MongoDb:

Single Field

An index created on a single field. The default MongoDB-defined _id index is an example of a single field index.

Compound Index

An index on multiple fields. For example:

db.collection.createIndex({a:1, b:-1})

This creates an index that sorts first on the a field in ascending order. Within these sorted results, the index sorts on b in descending order. Please note that the ORDER MATTERS when listing fields.

Mulitkey Index

Multikey indexes are used to index the content stored in arrays. If you create an index on a field that contains an array, MongoDb automatically handles the creation of a mulitkey index for that field.

Geospatial Index

MongoDb supports geospatial indexes for querying data that represents objects defined in a geometric space.

Text Index

A text index is used to support string fields.

Hashed Index

A hashed index is used to support hash based sharding in MongoDb

Performance

Remember that indexes require space in your database. In fact, an index requires at least 8 kB of data space. When you create an index, you are ultimately storing more information in your database. While increased performance can far outweigh the cost of additional storage, it's important to consider resource availability when creating indexes in MongoDB.

Also remember that indexes must be kept in sync with the collections they represent. This means any writes/updates to a collection require additional updates to the index. While such updates are handled automatically by MongoDb, they still make write operations more complex and jeopardize performance.

Best Practices

Your indexing strategy should be largely based on the nature of your application's queries. Specifically, the ratio of reads to writes should dictate the decision to index a field (or set of fields). If your application is frequently reading from a collection with infrequent updates, an index makes sense. If your application performs frequent updates on a field, it may not be a good idea to index because of the additional writes required to keep the index in sync.

It's also important to consider the amount of memory on your system. To maximize the efficiency of indexed queries, you want to ensure that your index can fit entirely in RAM. While not required, this ensures the system doesn't have to read the index from disk.

Selectivity is also an important factor. You want to create indexes on fields that are evenly distributed across the collection. This narrows search results by limiting the number of duplicate entries. For example, you wouldn't create an index on a field that has only two potential values.

You can specify a unique index to ensure high selectivity. A unique index requires that every field has a unique value. The default _id field is a an example of a unique index.

Conclusion

Indexing provides a faster way query data. An index stores sorted field values so queries don't have to perform full collection scans to find matching results. Indexes should be used on fields that are frequently queried and highly selective (unique). Remember that indexing results in faster reads but more complex writes. The nature of your application's queries should largely dictate your indexing strategy.

Join the conversation...