How do you scale WebSocket servers horizontally?

Answer

Horizontal scaling of WebSocket servers requires solving the state distribution problem — connections are stateful and tied to specific server instances. The architecture: (1) Sticky sessions at the load balancer ensure the same client always reaches the same server (NGINX ip_hash, AWS ALB cookie stickiness); (2) Shared pub/sub via Redis — Socket.IO Redis adapter or custom Redis Pub/Sub allows any server to broadcast to clients on any other server; (3) Shared session state — store authentication and user data in Redis so any server can validate connections; (4) Connection limits per node — a single Node.js process handles ~10K-100K concurrent connections depending on memory and message frequency; use Node.js cluster module or PM2 to utilize all CPU cores; (5) Horizontal Pod Autoscaler in Kubernetes with sticky sessions via NGINX Ingress; (6) Dedicated WebSocket gateway (separate from stateless REST API) that scales independently based on connection count metrics.