Compound Index | MongoDB Indexing Tutorial

Compound Index | MongoDB Indexing Tutorial

Index Performance | MongoDB Indexing Tutorial

Indexing Types | MongoDB Indexing Tutorial

Unique Index | MongoDB Indexing Tutorial

List Indexes | MongoDB Indexing Tutorial

Create Index | MongoDB Indexing Tutorial

Preface: What is an index?

An index allows you to more efficiently query data in MongoDB.

An index stores a subset of your collection in a B-Tree data structure. This sorted structure allows Mongo to efficiently query the collection without performing a collection scan on every single document.

While an index can improve query performance, there is an opportunity cost to managing the extra data structure. When you insert documents into your collection, these indexes must also be updated and maintained.

You can create an index like this...

db.people.createIndex({name:1})

This operation creates a single field index on the name field of the people collection.

The 1 indicates order...(1 = ascending) (-1 = descending)

Now when you perform a query like...

db.people.find({name:"Eric"})

Mongo can quickly consult the sorted name index to more efficiently find a match.

For a single field index (like the one above), order doesn't really matter. For compound indexes, the ordering is more important.

What is a compound index?

A compound index is a single index that includes multiple fields.

How to create a compound index in MongoDB

db.people.createIndex({name:1, email:-1})

This creates a compound index on the people collection.

This index first sorts people by name in ascending order and THEN by email in descending order.

How does a compound index work?

A compound index works by storing a subset of a collection's data in a sorted B-Tree data structure. When you create a compound index, you are technically duplicating the information in your database.

While this sounds expensive, it makes for much faster lookups. The reason being is the sorted data can be more easily traversed to find results faster.

Let's revisit our example...

db.people.createIndex({name:1, email:-1})

This compound index creates a sorted B-Tree structure where records are first stored by name in ascending order. The sorted records then store nested email values in descending order.

Sorting is key to making a compound index efficient. By sorting records based on name, Mongo can quickly eliminate keys it doesn't need to scan because of how a B-Tree data structure works.

Now, when you perform a Mongo query like this...

db.people.find({name:"Sam", email:"fred@gmail.com"})

it will be significantly faster because Mongo can quickly scan an ordered set of name keys versus every single document in the collection.

ORDER MATTERS!!!

The order of fields matters when creating a compound index. For example...

db.people.createIndex({name:1, email:-1})

will not make this query run any faster...

db.people.find({email:"sam@gmail.com"})

This is because the compound index stores the information sorted first by name then by email.

More specifically, Mongo uses prefixes when evaluating compound indexes.

What is a prefix?

A compound index includes a set of prefixes. A prefix is a fancy term for the beginning combination of fields in the index.

Let's say you create the following compound index...

db.people.createIndex({name:1, email:1, address:1})

This index has the following prefixes:

  • {name:1}
  • {name:1, email:1}

This means our compound query supports queries on...

  • name
  • name and email
  • name and email and address

This also means that our compound query DOES NOT support queries on

  • email
  • email and address
  • address

Long story short, the order of the fields matters because it dictates the index prefixes available for querying.

While you can utilize the index on name and address, it would be more efficient to omit the email field in this case.

When should you use a compound index?

Use a compound index when you query a collection frequently with the same set of fields. More specifically, if you are running queries frequently against the same collection based on equality, sorting, and ranges then a compound index may be appropriate.

For example...

db.people.find({name:"Sam", age:{$lt:30}).sort({email:1})

This query checks for equality...

name:"Sam"

a range based on the age field...

{$lt:30}

and performs a sort on email...

sort({email:1})

By creating a compound index following the ESR rule (see below), you're queries will be much faster...

db.people.createIndex({name:1, email:1, age:1})

As described in the "ESR rule" below, the order is important for optimizing the compound query.

Two birds one stone...

Remember that the index prefixes can be used to satisfy simpler (single field) indexes. For example...

db.people.createIndex({name:1, email:1, age:1})

will also satisfy...

db.people.createIndex({name:1})

No need to create a single field index on name if it's already a prefix in another compound index.

Compound index performance

In general, an index will greatly improve the read operations of your database.

Depending on the size of your collection, an index can slow things down. This is especially true when considering things like cardinality.

What is cardinality?

Cardinality = number of unique value

Remember that an index stores partial data in a sorted B-Tree like data structure. If a field having low cardinality (like gender or boolean flag) is indexed then the index won't benefit much from a sorted data set.

In cases like this, indexes can actually hurt database performance. Not only does your database have to do all the "book keeping" of managing the index on CRUD operations, it also has to scan a large portion of data.

The ESR Rule

The Equality, Sort, Range (ESR) rule states that compound indexed fields should be in order of equality, then sorting, then range.

Let's take a look at our previous example...

db.people.createIndex({name:1, email:1, age:1})

This compound index follows the ESR rule if your queries look something like this...

db.people.find({name:"Sam", age:{$lt:30}).sort({email:1})

Notice how equality is performed on the name field.

Notice how sorting is performed on the email field.

Notice how a range based operation ($lte) is performed on the age field.

Equality matches should come first because they are selective and quickly eliminate the amount of data to filter/sort.

Sorting should follow equality matches because fewer records must be sorted. This also allows Mongo to do a non-blocking sort.

Range operations should come after equality and sorting to minimize range based scans and being loosely bound to index keys.

Conclusion

Compound indexes allow you to create single indexes with multiple fields.

Prefixes are subsets of indexed fields in a compound index. Prefixes are why order matters when defining a compound index.

Compound indexes can satisfy single field indexes through their prefixes.

It's important to follow the ESR rule and choose index fields with low cardinality when creating compound indexes.

Your thoughts?