What is MongoDB sharding?

Answer

Sharding is MongoDB's horizontal scaling mechanism that distributes data across multiple servers (shards). Each shard is a replica set, holding a subset of the data. Components of a sharded cluster: (1) Shards: each stores a portion of the data (each is a replica set); (2) Config servers: store cluster metadata — which chunks are on which shards. Must be a 3-member replica set for HA; (3) mongos (query router): the interface between client applications and the cluster. Routes queries to the correct shards, merges results. Applications connect to mongos, not shards directly. Shard key: the field (or fields) MongoDB uses to distribute documents across shards. Choosing the right shard key is critical — affects data distribution, query routing, and performance. Good shard key: high cardinality, even write distribution, frequently used in queries. Bad shard key: monotonically increasing (like timestamp — all writes go to one shard "hot shard"), low cardinality (like boolean — only 2 shards possible). Chunk: a contiguous range of shard key values; the unit of data migration between shards. MongoDB auto-balances chunks across shards. Range-based sharding: documents with similar shard key values are grouped — efficient range queries. Hashed sharding: hash of shard key value determines placement — more even distribution but no range query efficiency. When to shard: when a single replica set can't handle data volume, query throughput, or working set size exceeds RAM.

Answer

More MongoDB Questions