Skip to main content

Reference: Rate Limiting Algorithms

A short reference on the four canonical rate limiting algorithms — token bucket, leaking bucket, fixed window, and sliding window — with when to pick each.

What it is

A rate limiting algorithm decides whether a given request should be allowed or rejected based on how many requests have been made within a time window or against a token budget. The algorithm determines the shape of enforcement: strict smoothing, burst tolerance, memory cost, and boundary behavior.

When you care

Rate limiting shows up in system design interviews whenever the question involves an API gateway, a multi-tenant service, or abuse protection. The trap is saying “we’ll use a counter” without naming the algorithm or its boundary behavior. Interviewers probe whether you understand the burst-at- boundary problem with fixed windows, why sliding window log is memory- expensive, and why token bucket is the industry default.

The algorithms

AlgorithmMemory per keyBurst handlingBoundary problem?Implementation
Token bucketO(1) — 2 valuesAllows bursts up to bucket capacityNoCounter + timestamp
Leaking bucketO(1) — queue sizeStrict smoothing, no burstsNoFIFO queue
Fixed window counterO(1) — 1 counterNoneYes — 2x burst at edgeCounter + window start
Sliding window logO(n) — all timestampsPreciseNoSorted timestamp set
Sliding window counterO(1) — 2 countersApproximatedMinimalWeighted average of two windows

Token bucket

Mechanism. A bucket holds up to capacity tokens. Tokens refill at a fixed rate (e.g., 10/sec). Each request costs one token. If the bucket is empty, reject. Refill is calculated lazily — no background thread, just tokens = min(capacity, tokens + elapsed × rate) on each check.

When to use. Default choice for most rate limiters. Natural burst tolerance (a client that was idle accumulates tokens and can spend them in a burst), low memory, and simple to explain to teams configuring limits. Used by AWS API Gateway, Stripe, and most cloud rate limiters.

Tradeoff. Bursts are allowed by design. If you need strict smoothing (exactly N requests per second, never more), token bucket is the wrong choice.

Leaking bucket

Mechanism. Requests enter a FIFO queue of fixed size. The queue drains at a constant rate. If the queue is full, reject. Output rate is perfectly smooth regardless of input burstiness.

When to use. When downstream requires a strictly constant request rate — e.g., a third-party API that penalizes bursts, or a network shaping scenario.

Tradeoff. Bursts are absorbed (queued), not served. Clients experience increased latency during bursts rather than immediate rejection. Harder to implement as a pure check-and-reject — it’s fundamentally a queue, not a counter.

Fixed window counter

Mechanism. Divide time into fixed windows (e.g., 0:00–0:59, 1:00–1:59). Maintain one counter per window. Increment on each request; reject if counter exceeds limit.

When to use. When simplicity matters more than precision, and the 2x boundary-burst problem is acceptable. Good for coarse limits (daily quotas) where the boundary effect is negligible.

Tradeoff. At the boundary between two windows, a client can make limit requests at 0:59 and limit more at 1:00 — effectively 2x the intended rate for a one-second span. This is the well-known boundary spike problem.

Sliding window log

Mechanism. Store the timestamp of every request in a sorted set. On each check, remove all timestamps older than now - window, then count remaining entries. If count >= limit, reject.

When to use. When you need perfectly accurate per-window counting and can afford the memory. Works well at low request volumes or for expensive operations where overcounting is costly.

Tradeoff. Memory is O(requests per window) per key. At 1000 req/sec with a 60-second window, that’s 60K timestamps per key — untenable at scale with millions of keys. Redis ZRANGEBYSCORE makes it easy to implement but expensive to run.

Sliding window counter

Mechanism. Keep counters for the current and previous fixed window. Approximate the sliding window count as: count = prev_window_count × overlap_fraction + current_window_count. For example, if we’re 30 seconds into the current minute, the overlap fraction of the previous window is 0.5.

When to use. When you want sliding window accuracy without the memory cost of the log approach. A good middle ground — eliminates the boundary spike while staying O(1) per key.

Tradeoff. The count is an approximation. In practice the error is small (Cloudflare measured <0.003% false positive rate), but it’s not exact. If your system requires provably correct counting (billing, compliance), use sliding window log or token bucket instead.

When to pick what

  • Default for API rate limiting: token bucket. Burst-friendly, O(1), lazy refill, well-understood by teams configuring rules.
  • Strict output smoothing: leaking bucket. When downstream demands constant rate.
  • Coarse quotas (daily/monthly): fixed window counter. Boundary spike is negligible at large windows.
  • Precise per-window counting at low volume: sliding window log. When you can afford the memory.
  • Precise per-window counting at high volume: sliding window counter. Best accuracy-to-memory ratio.

In interviews, name the algorithm, state why you picked it over the alternatives, and acknowledge the tradeoff. That’s the signal.