What is Kafka's ISR (In-Sync Replicas) management and unclean leader election?

Answer

The ISR (In-Sync Replicas) is the dynamic set of replicas that are fully caught up with the partition leader (within replica.lag.time.max.ms, default 30 seconds). Only ISR members are eligible for leader election by default. Unclean leader election (unclean.leader.election.enable): if the leader fails and no ISR member is available, Kafka can elect an out-of-sync replica as the new leader (default: false). This risks data loss — messages acknowledged by the old leader but not replicated to the new leader are permanently lost. For systems where data loss is unacceptable (banking, payments), keep unclean.leader.election.enable=false. For systems where availability is more important than durability (metrics collection), enabling it prevents an outage when all ISR members fail. Monitor UnderReplicatedPartitions JMX metric — it should always be 0 in steady state.