Reference: Rate Limiting Algorithms

What it is

A rate limiting algorithm decides whether a given request should be allowed or rejected based on how many requests have been made within a time window or against a token budget. The algorithm determines the shape of enforcement: strict smoothing, burst tolerance, memory cost, and boundary behavior.

When you care

Rate limiting shows up in system design interviews whenever the question involves an API gateway, a multi-tenant service, or abuse protection. The trap is saying “we’ll use a counter” without naming the algorithm or its boundary behavior. Interviewers probe whether you understand the burst-at- boundary problem with fixed windows, why sliding window log is memory- expensive, and why token bucket is the industry default.

The algorithms

Algorithm	Memory per key	Burst handling	Boundary problem?	Implementation
Token bucket	O(1) — 2 values	Allows bursts up to bucket capacity	No	Counter + timestamp
Leaking bucket	O(1) — queue size	Strict smoothing, no bursts	No	FIFO queue
Fixed window counter	O(1) — 1 counter	None	Yes — 2x burst at edge	Counter + window start
Sliding window log	O(n) — all timestamps	Precise	No	Sorted timestamp set
Sliding window counter	O(1) — 2 counters	Approximated	Minimal	Weighted average of two windows

Token bucket

Mechanism. A bucket holds up to capacity tokens. Tokens refill at a fixed rate (e.g., 10/sec). Each request costs one token. If the bucket is empty, reject. Refill is calculated lazily — no background thread, just tokens = min(capacity, tokens + elapsed × rate) on each check.

When to use. Default choice for most rate limiters. Natural burst tolerance (a client that was idle accumulates tokens and can spend them in a burst), low memory, and simple to explain to teams configuring limits. Used by AWS API Gateway, Stripe, and most cloud rate limiters.

Tradeoff. Bursts are allowed by design. If you need strict smoothing (exactly N requests per second, never more), token bucket is the wrong choice.

Leaking bucket

Mechanism. Requests enter a FIFO queue of fixed size. The queue drains at a constant rate. If the queue is full, reject. Output rate is perfectly smooth regardless of input burstiness.

When to use. When downstream requires a strictly constant request rate — e.g., a third-party API that penalizes bursts, or a network shaping scenario.

Tradeoff. Bursts are absorbed (queued), not served. Clients experience increased latency during bursts rather than immediate rejection. Harder to implement as a pure check-and-reject — it’s fundamentally a queue, not a counter.

Fixed window counter

Mechanism. Divide time into fixed windows (e.g., 0:00–0:59, 1:00–1:59). Maintain one counter per window. Increment on each request; reject if counter exceeds limit.

When to use. When simplicity matters more than precision, and the 2x boundary-burst problem is acceptable. Good for coarse limits (daily quotas) where the boundary effect is negligible.

Tradeoff. At the boundary between two windows, a client can make limit requests at 0:59 and limit more at 1:00 — effectively 2x the intended rate for a one-second span. This is the well-known boundary spike problem.

Sliding window log

Mechanism. Store the timestamp of every request in a sorted set. On each check, remove all timestamps older than now - window, then count remaining entries. If count >= limit, reject.

When to use. When you need perfectly accurate per-window counting and can afford the memory. Works well at low request volumes or for expensive operations where overcounting is costly.

Tradeoff. Memory is O(requests per window) per key. At 1000 req/sec with a 60-second window, that’s 60K timestamps per key — untenable at scale with millions of keys. Redis ZRANGEBYSCORE makes it easy to implement but expensive to run.

Sliding window counter

Mechanism. Keep counters for the current and previous fixed window. Approximate the sliding window count as: count = prev_window_count × overlap_fraction + current_window_count. For example, if we’re 30 seconds into the current minute, the overlap fraction of the previous window is 0.5.

When to use. When you want sliding window accuracy without the memory cost of the log approach. A good middle ground — eliminates the boundary spike while staying O(1) per key.

Tradeoff. The count is an approximation. In practice the error is small (Cloudflare measured <0.003% false positive rate), but it’s not exact. If your system requires provably correct counting (billing, compliance), use sliding window log or token bucket instead.

When to pick what

Default for API rate limiting: token bucket. Burst-friendly, O(1), lazy refill, well-understood by teams configuring rules.
Strict output smoothing: leaking bucket. When downstream demands constant rate.
Coarse quotas (daily/monthly): fixed window counter. Boundary spike is negligible at large windows.
Precise per-window counting at low volume: sliding window log. When you can afford the memory.
Precise per-window counting at high volume: sliding window counter. Best accuracy-to-memory ratio.

In interviews, name the algorithm, state why you picked it over the alternatives, and acknowledge the tradeoff. That’s the signal.

Walkthrough: Designing a Rate Limiter — applies token bucket in a full distributed rate limiter design, including the Redis Lua implementation.
Walkthrough: Designing an Ad Click Aggregation System — a related counting problem where the challenge is exactly-once semantics rather than rejection.

Tags #system-design#concepts#rate-limiting#reference