How would you design an autocomplete / typeahead search system?
Why Interviewers Ask This
Advanced questions like this reveal whether a candidate has internalized System Design deeply enough to make architectural decisions. Strong answers demonstrate both breadth and depth of experience.
Answer
An autocomplete system suggests completions as users type, with low latency (<50ms). Scale: 5B Google searches/day → 50% type at least 5 chars = 25B trie lookups/day. Core data structure — Trie: prefix tree storing all possible queries. Each node represents a prefix; leaf nodes or marked nodes are complete queries with frequency counts. Traversal from current prefix returns all completions, sorted by frequency. Top-K retrieval: store top K (usually 5-10) completions at each trie node — no need to traverse the entire subtree, just return the cached top-K. Build top-K by propagating from leaves upward during trie construction. Trie limitations at scale: trie too large to fit on one machine → partition by prefix (A-G shard 1, H-P shard 2, Q-Z shard 3). Data pipeline: log search queries → Hadoop/Spark aggregate frequencies weekly (or daily) → build trie from top-N queries → serialize and distribute to search servers. Storage: serialize trie to disk (binary format); load into memory on each search server. Redis also supports sorted sets for prefix matching: ZADD search:prefix:ty {score:frequency, member:query}. Caching: most prefix lookups are concentrated on popular prefixes (80% on 20% of prefixes) → cache top prefixes in L1 cache (browser, CDN, application). Personalization: layer personal search history over global completions — re-rank based on user's own patterns. Real-time updates: use a streaming pipeline (Kafka) to update frequencies for trending queries without full weekly rebuild. Latency: partition queries to appropriate shard → in-memory trie lookup → return top-K. Total: <10ms server-side.
Pro Tip
This topic has System Design-specific nuances that differ from general programming. Highlighting those nuances in your answer shows expertise rather than generic knowledge.
Previous
What is a vector clock and how does it solve consistency problems?
Next
What is geo-sharding and how do you handle data locality requirements?
More System Design Questions
View all →- Advanced How would you design a distributed file system like HDFS?
- Advanced How would you design a video streaming service like Netflix?
- Advanced What is the consistent hashing with virtual nodes in detail?
- Advanced How would you design a global distributed database like Google Spanner?
- Advanced What is the difference between optimistic and pessimistic locking in distributed systems?