How does MongoDB replication work internally?

Answer

MongoDB replication uses an oplog (operations log)-based, eventually-consistent replication protocol with Raft-inspired leader election. Oplog structure: the primary's oplog (local.oplog.rs) is a capped collection storing every write operation as an idempotent operation entry. Entries include: ts (timestamp), h (hash), op (operation type: i/u/d/c/n), ns (namespace), o (original operation), o2 (update criteria). Secondary replication flow: each secondary has a dedicated thread ("oplog fetcher") that polls the primary for new oplog entries using a tailable cursor — like tail -f but over the network. Retrieved entries are batched and applied to the secondary's data store. Oplog application: secondaries apply oplog entries in order (monotonically increasing timestamp). The application is idempotent — safe to re-apply if needed. Election protocol (Raft-based): each member has a priority (default 1); when the primary is unreachable (no heartbeat for electionTimeoutMillis = 10s), eligible secondaries start an election. Candidate increments term, requests votes from peers. Member votes for the first candidate it receives with a log at least as up-to-date as itself and it hasn't voted in this term. Candidate with majority wins, becomes primary. Replication lag monitoring: rs.printSecondaryReplicationInfo() shows each secondary's oplog lag. Read from secondary: secondaries may lag — reads with readPreference: secondary may return stale data. Use readPreference: "secondaryPreferred" to read from secondary only when available, falling back to primary.

Answer

More MongoDB Questions