What is MongoDB sharding?

Why Interviewers Ask This

This is a classic screening question for MongoDB roles. Hiring managers ask it early in interviews to gauge your baseline understanding and determine if you can communicate technical concepts clearly.

Answer

Sharding is MongoDB's horizontal scaling mechanism that distributes data across multiple servers (shards). Each shard is a replica set, holding a subset of the data. Components of a sharded cluster: (1) Shards: each stores a portion of the data (each is a replica set); (2) Config servers: store cluster metadata — which chunks are on which shards. Must be a 3-member replica set for HA; (3) mongos (query router): the interface between client applications and the cluster. Routes queries to the correct shards, merges results. Applications connect to mongos, not shards directly. Shard key: the field (or fields) MongoDB uses to distribute documents across shards. Choosing the right shard key is critical — affects data distribution, query routing, and performance. Good shard key: high cardinality, even write distribution, frequently used in queries. Bad shard key: monotonically increasing (like timestamp — all writes go to one shard "hot shard"), low cardinality (like boolean — only 2 shards possible). Chunk: a contiguous range of shard key values; the unit of data migration between shards. MongoDB auto-balances chunks across shards. Range-based sharding: documents with similar shard key values are grouped — efficient range queries. Hashed sharding: hash of shard key value determines placement — more even distribution but no range query efficiency. When to shard: when a single replica set can't handle data volume, query throughput, or working set size exceeds RAM.

Common Mistake

Candidates often give textbook answers here. Interviewers are more impressed when you relate the concept to a specific problem you solved in a real MongoDB project.