How would you design a chat application like WhatsApp?
Why Interviewers Ask This
This question targets practical, hands-on experience with System Design. Interviewers want to see if you've worked with these concepts in real projects, not just read about them. Strong answers include concrete examples.
Answer
Designing WhatsApp-scale chat (2B users, 100B messages/day). Core features: 1:1 messaging, group chat, message status (sent/delivered/read), online presence, media sharing. Message flow: sender → WebSocket server → message service → recipient's server → recipient WebSocket. If recipient offline → push notification + store message until next online. Protocol: WebSockets for persistent bidirectional connections. WhatsApp uses XMPP (extensible messaging protocol). Connection management: millions of concurrent WebSocket connections per datacenter. Use dedicated connection servers (stateful — maintain user-to-server mapping in Redis: user_id → server_id). When routing a message, look up recipient's connection server, forward there. Message storage: store messages until delivered; delete from server after confirmed delivery (WhatsApp's approach — messages only on device; Signal for E2E encrypted systems). For cloud backup: S3. Use Cassandra for message history (user_id, conversation_id, timestamp, message_id, content) — optimized for time-range reads per conversation. Group messaging: server fan-out — send to each group member. For large groups (thousands), use asynchronous fan-out via message queue. Presence system: users broadcast "online" status every 30s (heartbeat) to Redis. Subscription model — users subscribe to contacts' presence via pub/sub. Media: direct client-to-S3 upload with pre-signed URLs; server stores metadata; thumbnail in Cassandra; full media in S3 served via CDN. E2E encryption: Signal protocol — keys managed on devices, server only sees ciphertext.
Pro Tip
This topic has System Design-specific nuances that differ from general programming. Highlighting those nuances in your answer shows expertise rather than generic knowledge.
Previous
What is the difference between message queues and event streaming platforms?
Next
How would you design a distributed file system like HDFS?