What is rate limiting?

Answer

Rate limiting controls how many requests a client can make within a time window, protecting the system from abuse, DDoS, and accidental overload. Algorithms: (1) Token bucket: a bucket holds N tokens, refilled at rate R. Each request consumes a token. If the bucket is empty, the request is rejected. Allows short bursts up to bucket size while maintaining average rate; (2) Leaky bucket: requests fill a fixed-size queue (bucket) processed at a constant rate — smooths out bursts but doesn't allow them; (3) Fixed window counter: count requests in fixed time windows (e.g., per minute). Simple but has edge case — a burst at the boundary of two windows allows 2N requests in 2× the interval; (4) Sliding window log: store timestamps of all requests in a log; count requests in the past [window]. Accurate but memory-intensive; (5) Sliding window counter: hybrid — use fixed window counts but weight by how much of the window has elapsed. Good balance of accuracy and memory. Implementation: typically using Redis atomic operations (INCR + EXPIRE for fixed window; ZADD + ZRANGEBYSCORE for sliding). Rate limit by: IP address, user ID, API key, endpoint. Response: return HTTP 429 Too Many Requests with Retry-After header. Distributed rate limiting: shared Redis ensures consistent limiting across multiple app servers. Use cases: API quotas, login attempt limiting (brute force prevention), scraping protection.

Answer

More System Design Questions