How does MongoDB handle index builds on large collections?
Why Interviewers Ask This
Advanced questions like this reveal whether a candidate has internalized MongoDB deeply enough to make architectural decisions. Strong answers demonstrate both breadth and depth of experience.
Answer
Building indexes on large collections is a significant operation. MongoDB 4.2+ introduced Hybrid Index Build which replaced the two previous approaches (foreground and background builds). Hybrid Index Build (MongoDB 4.2+): holds an exclusive lock only briefly at the start and end of the build. During the bulk of the build (scanning and sorting all documents), only an intent lock is held — reads and writes can continue normally. At the end, a brief exclusive lock finalizes the index. This effectively makes all index builds non-blocking in practice. Index build flow: (1) Lock the collection briefly; start the build; (2) Scan all documents and insert into a sorted buffer; (3) As new writes come in, track them in a "side writes" table; (4) Flush the sorted buffer to disk creating the initial index structure; (5) Drain the "side writes" table — apply writes that arrived during the build; (6) Acquire a brief exclusive lock; finalize the index; commit; (7) Release lock. Performance impact: index builds consume significant I/O (scanning all data + sort) and CPU. On a replica set, builds are coordinated: (1) Primary builds the index first; (2) After commit, replication propagates the index creation to secondaries; (3) Secondaries build the index one at a time (rolling build). Rolling index build (manual): for zero-impact builds on production: build on secondaries first (they're not serving primary reads), then step down primary, build on new secondary. Monitoring: db.currentOp({ "command.createIndexes": { $exists: true } }) shows index build progress. db.adminCommand({ currentOp: 1, $all: true }) for all operations.
Pro Tip
Demonstrate both theoretical understanding and practical experience. Say what it is, then give an example of how you actually used it in a MongoDB codebase.
Previous
What is the Aggregation $setWindowFields stage?
Next
What is the MongoDB Aggregation $merge and $out stage?
More MongoDB Questions
View all →- Advanced How does MongoDB replication work internally?
- Advanced How does MongoDB sharding distribute data internally?
- Advanced What is the WiredTiger cache and how does it affect performance?
- Advanced What is the Aggregation $setWindowFields stage?
- Advanced What is the MongoDB Aggregation $merge and $out stage?