How does MongoDB sharding distribute data internally?

Why Interviewers Ask This

Interviewers ask this to evaluate whether you have the depth of knowledge needed to mentor others and lead technical decisions. The expected answer goes beyond definitions into practical implications and real-world consequences.

Answer

MongoDB sharding distributes data through a chunks and balancer system. Chunk: a contiguous range of shard key values, initially covering the entire key space. Default chunk size: 128MB. The chunks table in config servers maps: chunk range → shard. Chunk splitting: when a chunk grows beyond the threshold (chunkSize), MongoDB automatically splits it into two smaller chunks. The split creates a new chunk boundary at the median key value. Balancer: the balancer process (runs on mongos or config server primary) periodically checks chunk distribution across shards. If a shard has significantly more chunks than others (threshold: ≥9 chunk difference), the balancer migrates chunks to even the load. Migration: chunk data is copied from source shard to destination shard, then config metadata is updated, then cleanup. Migrations happen in background and are transparent to the application. Jumbo chunks: chunks that can't be split (all documents have the same shard key value — cardinality is too low) and therefore can't be migrated. Mark as "jumbo" — they stay on one shard, creating hotspots. Solution: choose a higher-cardinality shard key or use hashed sharding. Zone sharding: assign certain shard key ranges to specific shards (geographic routing: EU data → EU shard). Shard key strategies: hashed shard key (hash(field)) — evenly distributes inserts but can't do range-based routing; ranged shard key — enables range queries on one shard but risks hotspots with monotonically increasing keys. Pre-splitting: for large imports, pre-split chunks before inserting to distribute data evenly from the start.

Pro Tip

Back up your answer with a specific project or situation. Saying 'In my last MongoDB project, I used this when...' immediately makes your answer more credible and memorable.