What is the difference between latency and throughput?
Why Interviewers Ask This
Foundational questions like this help interviewers calibrate the rest of the interview. A confident, accurate answer signals that you have solid System Design basics — a prerequisite for any developer role.
Answer
Latency is the time it takes to complete a single operation — from request to response. Measured in milliseconds (ms) or microseconds (μs). Types: network latency (time for data to travel), processing latency (time to compute), database latency (query execution time). Percentiles matter more than averages: p99 latency (the 99th percentile — 99% of requests are faster than this) reveals tail latency that averages hide. A p99 of 500ms means 1% of users wait 500ms — significant at scale. Throughput is the number of operations completed per unit time — requests per second (RPS), transactions per second (TPS), or data volume (GB/s). Measures overall system capacity. Relationship: they are related but different. A system can have: low latency + high throughput (good); low latency + low throughput (fast but limited capacity); high latency + high throughput (slow but handles many concurrent requests via batching). Little's Law: L = λW — average number of requests in the system = arrival rate × average time in system. Doubling throughput with same latency means more concurrent users. Trade-offs: optimizing for latency often reduces throughput (fewer concurrent requests); optimizing for throughput (batching) increases latency. Practical benchmarks: L1 cache reference ~0.5ns; SSD sequential read ~100μs; network round-trip same datacenter ~0.5ms; disk seek ~10ms; cross-continental round-trip ~150ms.
Pro Tip
Before answering, structure your response: one-line definition → real-world analogy → concrete example from a project. This makes even complex System Design answers easy to follow.