Create Index | MongoDB Indexing Tutorial

Compound Index | MongoDB Indexing Tutorial

Index Performance | MongoDB Indexing Tutorial

Indexing Types | MongoDB Indexing Tutorial

Unique Index | MongoDB Indexing Tutorial

List Indexes | MongoDB Indexing Tutorial

Create Index | MongoDB Indexing Tutorial

Prefix: Why Use Indexes in MongoDB?

Indexing allows you to query collections faster without scanning entire collections.

Let's say we seed a Mongo collection users with 100MM records...

const { MongoClient } = require('mongodb');
const url = 'mongodb://localhost:27017';
const client = new MongoClient(url);
const dbName = 'myProject';
const faker = require('faker')
async function main() {
  await client.connect();
  console.log('Connected successfully to server');
  const db = client.db(dbName);
  const collection = db.collection('documents');
  const users = []
  for(let i = 0; i < 10000000; i++){
    let user = {
      name:faker.name.findName(),
      email:faker.internet.email(),
      age:faker.datatype.number({
        'min':18,
        'max':100
      })
    }
    users.push(user)
  }
  await collection.insertMany(users)
  return 'done.';
}
main()
  .then(console.log)
  .catch(console.error)
  .finally(() => client.close());

and we run a query against this users collection...

db.getCollection('users').find({age:{"$lt":30}})

This takes around 4 seconds to execute locally. This is because Mongo must scan the entire 100MM records to identify which users are under 30.

Now let's add an index to the age field.

db.getCollection('users').createIndex({age:1})

When we run the same query, it takes roughly half the time to execute. This is why indexing is so important and powerful in MongoDB.

How does indexing work?

When you create an index, Mongo replicates a portion of collection data into a sorted B-Tree data structure. This sorted tree structure allows for faster lookups. Rather than scan an entire collection, Mongo can traverse this B-Tree to more efficiently find the data it cares about.

While indexing can create huge performance benefits, there is a cost to maintaining the index. When records are inserted into a collection, the index must be updated with the new data. Remember that indexing a collection ultimately increases the size of your database.

For these reasons, it's important to use indexes only where appropriate. Indexing collections based on the most queried fields is a good starting point.

Create a Single Field Index

db.getCollection('users').createIndex({age:1})

This creates a single field index on the users collection.

The 1 specifies the order (ascending) vs -1 descending.

Note that for a single field index, the order doesn't matter. You will get the benefits of the sorted B-Tree traversing either way :).

When should you use a single field index?

Use a single field index when you are frequently querying by one field.

Create a Compound Index

db.getCollection('users').createIndex({email:1, name:-1})

This creates a compound index on the users collection.

This is super useful for supporting queries like this:

db.getCollection('users').find({email:"alex@gmail.com", name:"Sam"})

This compound index can also be leveraged for a simpler query like this:

db.getCollection('users').find({email:"alex@gmail.com"})

but NOT this...

db.getCollection('users').find({name:"Sam"})

The reason being is Mongo uses prefixes (or the beginning subsets of indexed fields). To this extend, the ordering of the fields in the index matters. Without the email field being included in the query, Mongo can not leverage the compound index to query by name.

Compound Indexes and sorting

This is super useful for supporting sort queries like this:

db.getCollection('users').find().sort({email:1, name:-1})

or this...

db.getCollection('users').find().sort({email:-1, name:1})

but NOT this...

db.getCollection('users').find().sort({email:1, name:1})

and also NOT this...

db.getCollection('users').find().sort({name:-1, email:1})

This is because indexes can only be leveraged on sort operations if the order of fields matches the order of fields defined in the index. Additionally only the index key pattern (email:1, name:-1) and it's inverse (email:-1, name:1) can apply to sort operations.

When should you use a compound index?

Use a compound index when you frequently query on multiple fields. Also remember that you don't need to define a single field index if your compound index matches on the prefix of the index field...aka

db.getCollection('users').createIndex({email:1, name:-1})

also satisfies

db.getCollection('users').createIndex({email:1})

Create a Multikey Index

A multikey index is something MongoDB automagically create for you based on fields that hold arrays.

Say your collection's document structure looks like this:

{
    "_id" : ObjectId("62577886b29616254381f6b9"),
    "name" : "Lynne Bruen",
    "email" : "Rylan.Morissette41@gmail.com",
    "age" : 33,
    "altInfo" : [ 
        {
            "otherEmail" : "Janie.Bartoletti24@yahoo.com",
            "otherName" : "Priscilla Douglas"
        }, 
        {
            "otherEmail" : "Reymundo60@yahoo.com",
            "otherName" : "Miss Dexter Gutmann"
        }, 
        {
            "otherEmail" : "Heloise.Leffler12@yahoo.com",
            "otherName" : "Janie Windler"
        }
    ]
}

if you create an index on altInfo like this:

db.getCollection('documents').createIndex({altInfo:1})

then Mongo will create a multikey index automatically.

This also applies to sub documents. For example, you can also create an index like this...

db.getCollection('documents').createIndex({altInfo.otherEmail:1})

or this...

db.getCollection('documents').createIndex({altInfo.otherEmail:-1, altInfo.otherName:-1})

Limitations to Multikey Indexes

Multikey indexes only work when a single field in the document is an array.

For example if your data looks like this...

{a:[1,2,3],b:[4,5,6]}

then this WILL NOT work

db.getCollection('documents').createIndex({a:1,b:1})

Furthermore, if a compound multikey index already exists and you try to add a new document with more fields having arrays, the insert will fail.

For example, if you create this compound index..

db.getCollection('documents').createIndex({a:1, b:1})

then this insert works...

db.getCollection('documents').insert({a:[1,2,3], b:1})

and this insert works...

db.getCollection('documents').insert({a:1, b:[1,2,3]})

but NOT this...

db.getCollection('documents').insert({a:[1,2,3], b:[1,2,3]})

Remember that multikey indexes are largely handled behind the scenes for you. It's important to understand when these are created and the limitations they introduce when inserting / querying documents with array fields.

Create a Text Index

db.getCollection('documents').createIndex({name:"text"})

This creates.a text index on the name field. This allows you to run $text searches in Mongo...

db.getCollection('documents').find({
 $text: {
  $search: "Vincent",
  $caseSensitive: false
 }
})

$text searches are a preferred way to search for text based string fields in Mongo. You can't run $text based queries without a text index.

A collection can only have one text index. The good news is a text index can cover multiple fields...

db.getCollection('documents').createIndex(
  {
    email:"text",
    name:"text"
  }
)

When should you create a text index?

Create a text index when you want to search string based fields with $text.

Create a Wildcard Index

db.getCollection('documents').createIndex({"altInfo.$**":1})

This creates a wildcard index which will index all of the subfields / elements for a given field.

This index makes sense if your data looks like this...

{"altInfo": {"emails":["sam@gmail.com", "sarah@gmail.com"]}}
{"altInfo": {"favoriteColor":"red"}}
{"altInfo": "defaultUser"}
{"altInfo": {"middleName":"Joseph"}}

Notice how altInfo stores different attributes and data types. This is a key advantage of MongoDB's "schemaless" design as attributes can be added on the fly.

When should you create a wildcard index?

Creating a wildcard index makes sense when you want to index fields whose attributes/sub elements aren't known. This acts as a "catch all" for indexing a flexible schema in MongoDB.

While wildcard indexes are extremely convenient, they should not be seen as a short cut to individual indexing on fields. This is because there is a hefty performance cost to managing such a flexible index. Specifically, updates, reads, and deletes can take exponentially longer when utilizing wildcard indexes.

Other Index Types

Supporting Geospatial Queries

MongoDB supports Geospatial data through GeoJSON data and legacy coordinate pairs. Geospatial data describes objects as they relate to the surface of the earth.

2dsphere Indexes

db.getCollection('documents').createIndex({location:"2dsphere"})

When should you use a 2dsphere index?

Use a 2dsphere index when you want to perform queries that calculate geometries on an earth-like sphere.

2d Indexes

db.collection.createIndex({ _id: "hashed" })

When should you use a 2dIndex?

Use a 2d index when you want to query data stored as points in a two-dimensional plane.

geoHaystack Indexes

db.getCollection('documents').createIndex({location:"geoHaystack", type:1, bucketSize:1})

When should you use a geoHaystack index?

Use a geoHaystack index when you want to perform queries on geospatial data over a small physical area.

Hashed Indexes

db.getCollection('documents').createIndex({location:"hashed"})

When should you use a hashed index?

Use hashed indexes when you want to index data partitioned over a sharded cluster.

Conclusion

Indexing your database is one of the most challenging and important parts of non relational database design. It's important to understand the nature of your application and which fields you'll be querying against most frequently when designing your indexing strategy.

While indexing is design to make querying your database faster, remember that there is an opportunity cost to all of the book keeping. This is because indexes must be maintained and ultimately increase the size of your database.

Your thoughts?