How would you design an autocomplete / typeahead search system?

Q: How would you design an autocomplete / typeahead search system?

An autocomplete system suggests completions as users type, with low latency (&lt;50ms). Scale: 5B Google searches/day → 50% type at least 5 chars = 25B trie lookups/day. Core data structure — Trie: prefix tree storing all possible queries. Each node represents a prefix; leaf nodes or marked nodes are complete queries with frequency counts. Traversal from current prefix returns all completions, sorted by frequency. Top-K retrieval: store top K (usually 5-10) completions at each trie node —

Answer

An autocomplete system suggests completions as users type, with low latency (<50ms). Scale: 5B Google searches/day → 50% type at least 5 chars = 25B trie lookups/day. Core data structure — Trie: prefix tree storing all possible queries. Each node represents a prefix; leaf nodes or marked nodes are complete queries with frequency counts. Traversal from current prefix returns all completions, sorted by frequency. Top-K retrieval: store top K (usually 5-10) completions at each trie node — no need to traverse the entire subtree, just return the cached top-K. Build top-K by propagating from leaves upward during trie construction. Trie limitations at scale: trie too large to fit on one machine → partition by prefix (A-G shard 1, H-P shard 2, Q-Z shard 3). Data pipeline: log search queries → Hadoop/Spark aggregate frequencies weekly (or daily) → build trie from top-N queries → serialize and distribute to search servers. Storage: serialize trie to disk (binary format); load into memory on each search server. Redis also supports sorted sets for prefix matching: ZADD search:prefix:ty {score:frequency, member:query}. Caching: most prefix lookups are concentrated on popular prefixes (80% on 20% of prefixes) → cache top prefixes in L1 cache (browser, CDN, application). Personalization: layer personal search history over global completions — re-rank based on user's own patterns. Real-time updates: use a streaming pipeline (Kafka) to update frequencies for trending queries without full weekly rebuild. Latency: partition queries to appropriate shard → in-memory trie lookup → return top-K. Total: <10ms server-side.

Answer

More System Design Questions