What is the Raft consensus algorithm?

Why Interviewers Ask This

Mid-level System Design roles require deep understanding of this topic. Interviewers ask this to separate candidates who truly understand the mechanics from those who only know surface-level concepts.

Answer

Raft is a consensus algorithm designed to be more understandable than Paxos, ensuring that a cluster of nodes agrees on a sequence of values even when some nodes fail. Used in: etcd (Kubernetes), CockroachDB, TiKV, Consul. Key concepts: (1) Leader election: one node is the leader — handles all client requests. If the leader fails, a new one is elected. Nodes start as followers; if they don't hear from a leader, they become candidates and request votes; candidate wins with majority votes and becomes the new leader; (2) Log replication: client writes go to the leader; leader appends to its log and sends AppendEntries RPCs to followers; when a majority of followers acknowledge, the entry is "committed" and the leader responds to the client; committed entries are applied to the state machine; (3) Safety: only nodes with up-to-date logs can win elections — ensures committed entries are never lost. Term: Raft time is divided into terms — each election starts a new term. Terms detect stale leaders (a node hearing from a node with a lower term knows it's stale). Guarantees: if N is the cluster size, Raft tolerates ⌊(N-1)/2⌋ failures — majority must be alive. Comparison with Paxos: Paxos is the theoretical foundation (more flexible but complex); Raft is a practical, implementable alternative. Multi-Raft: divide data into many Raft groups, each independently electing leaders — enables distributed databases to scale Raft.

Common Mistake

A common mistake is memorizing definitions without understanding implications. When asked this question, go one level deeper — explain what happens when this concept is misused or ignored.