How do you implement a distributed WebSocket system with message ordering guarantees?

Answer

Guaranteeing message order in a distributed WebSocket system is complex because messages can arrive from multiple server instances via Redis pub/sub with network delays. Architecture: (1) Monotonic sequence numbers — a central or distributed counter (Redis INCR) assigns incrementing sequence numbers to messages in each conversation/channel; (2) Client-side reordering buffer — clients buffer received messages and deliver them in order using the sequence number, waiting up to a configurable timeout for missing messages; (3) Server-side ordering — route all messages for a conversation through a single actor/worker (sharded by conversation ID) to ensure in-order processing before broadcasting; (4) Kafka as message bus — replace Redis pub/sub with Apache Kafka partitioned by conversation ID. Within a partition, Kafka guarantees order. Each WebSocket server instance consumes from partitions corresponding to its connected clients; (5) Event sourcing — store events in an ordered append-only log; clients subscribe from a specific offset, ensuring they never miss or reorder events.