What it is
A rate limiting algorithm decides whether a given request should be allowed or rejected based on how many requests have been made within a time window or against a token budget. The algorithm determines the shape of enforcement: strict smoothing, burst tolerance, memory cost, and boundary behavior.
When you care
Rate limiting shows up in system design interviews whenever the question involves an API gateway, a multi-tenant service, or abuse protection. The trap is saying “we’ll use a counter” without naming the algorithm or its boundary behavior. Interviewers probe whether you understand the burst-at- boundary problem with fixed windows, why sliding window log is memory- expensive, and why token bucket is the industry default.
The algorithms
| Algorithm | Memory per key | Burst handling | Boundary problem? | Implementation |
|---|---|---|---|---|
| Token bucket | O(1) — 2 values | Allows bursts up to bucket capacity | No | Counter + timestamp |
| Leaking bucket | O(1) — queue size | Strict smoothing, no bursts | No | FIFO queue |
| Fixed window counter | O(1) — 1 counter | None | Yes — 2x burst at edge | Counter + window start |
| Sliding window log | O(n) — all timestamps | Precise | No | Sorted timestamp set |
| Sliding window counter | O(1) — 2 counters | Approximated | Minimal | Weighted average of two windows |
Token bucket
Mechanism. A bucket holds up to capacity tokens. Tokens refill at a
fixed rate (e.g., 10/sec). Each request costs one token. If the bucket is
empty, reject. Refill is calculated lazily — no background thread, just
tokens = min(capacity, tokens + elapsed × rate) on each check.
When to use. Default choice for most rate limiters. Natural burst tolerance (a client that was idle accumulates tokens and can spend them in a burst), low memory, and simple to explain to teams configuring limits. Used by AWS API Gateway, Stripe, and most cloud rate limiters.
Tradeoff. Bursts are allowed by design. If you need strict smoothing (exactly N requests per second, never more), token bucket is the wrong choice.
Leaking bucket
Mechanism. Requests enter a FIFO queue of fixed size. The queue drains at a constant rate. If the queue is full, reject. Output rate is perfectly smooth regardless of input burstiness.
When to use. When downstream requires a strictly constant request rate — e.g., a third-party API that penalizes bursts, or a network shaping scenario.
Tradeoff. Bursts are absorbed (queued), not served. Clients experience increased latency during bursts rather than immediate rejection. Harder to implement as a pure check-and-reject — it’s fundamentally a queue, not a counter.
Fixed window counter
Mechanism. Divide time into fixed windows (e.g., 0:00–0:59, 1:00–1:59). Maintain one counter per window. Increment on each request; reject if counter exceeds limit.
When to use. When simplicity matters more than precision, and the 2x boundary-burst problem is acceptable. Good for coarse limits (daily quotas) where the boundary effect is negligible.
Tradeoff. At the boundary between two windows, a client can make
limit requests at 0:59 and limit more at 1:00 — effectively 2x the
intended rate for a one-second span. This is the well-known boundary spike
problem.
Sliding window log
Mechanism. Store the timestamp of every request in a sorted set. On
each check, remove all timestamps older than now - window, then count
remaining entries. If count >= limit, reject.
When to use. When you need perfectly accurate per-window counting and can afford the memory. Works well at low request volumes or for expensive operations where overcounting is costly.
Tradeoff. Memory is O(requests per window) per key. At 1000 req/sec
with a 60-second window, that’s 60K timestamps per key — untenable at
scale with millions of keys. Redis ZRANGEBYSCORE makes it easy to
implement but expensive to run.
Sliding window counter
Mechanism. Keep counters for the current and previous fixed window.
Approximate the sliding window count as:
count = prev_window_count × overlap_fraction + current_window_count.
For example, if we’re 30 seconds into the current minute, the overlap
fraction of the previous window is 0.5.
When to use. When you want sliding window accuracy without the memory cost of the log approach. A good middle ground — eliminates the boundary spike while staying O(1) per key.
Tradeoff. The count is an approximation. In practice the error is small (Cloudflare measured <0.003% false positive rate), but it’s not exact. If your system requires provably correct counting (billing, compliance), use sliding window log or token bucket instead.
When to pick what
- Default for API rate limiting: token bucket. Burst-friendly, O(1), lazy refill, well-understood by teams configuring rules.
- Strict output smoothing: leaking bucket. When downstream demands constant rate.
- Coarse quotas (daily/monthly): fixed window counter. Boundary spike is negligible at large windows.
- Precise per-window counting at low volume: sliding window log. When you can afford the memory.
- Precise per-window counting at high volume: sliding window counter. Best accuracy-to-memory ratio.
In interviews, name the algorithm, state why you picked it over the alternatives, and acknowledge the tradeoff. That’s the signal.
Related
- Walkthrough: Designing a Rate Limiter — applies token bucket in a full distributed rate limiter design, including the Redis Lua implementation.
- Walkthrough: Designing an Ad Click Aggregation System — a related counting problem where the challenge is exactly-once semantics rather than rejection.