The problem
You’re asked to design a news feed — Facebook’s home timeline, Twitter’s home timeline, Instagram’s feed. The interviewer says something like: “Design the service that shows a user the posts from people they follow, ordered by time (or relevance). Support posting and viewing the feed.”
Sounds like a list. It isn’t. This is the canonical “fan-out at scale” problem, and the trap is the celebrity problem: if one user is followed by 100M others, a naive design either writes to 100M inboxes on every post (write storm) or scans 100M authors on every read (read storm). Committing to push, pull, or hybrid — and justifying it for this system — is what the question is really testing.
Below is how I’d walk through this, start to finish.
1. Clarify before you design
First 3–5 minutes. Questions I’d ask:
- Scale. How many users? How many posts per user per day? What’s the average follow count? What’s the max follow count (for celebrities)? The system design primer numbers are a reasonable anchor, but for a feed the distribution matters more than the average.
- Read/write ratio. A timeline is the classic read-heavy system — typically 100:1 or higher. This shapes whether we optimize the read path or the write path.
- Ordering. Strictly chronological? Relevance-ranked (ML)? Hybrid? Default for a senior-level interview: relevance-ranked, because that’s what real products do and it forces a richer design.
- Feed freshness. How stale can a feed be? Seconds? Minutes? A chronological feed has tighter freshness requirements than a ranked one.
- Media support. Just text, or images/video too? Media changes the storage cost and CDN story significantly. Usually worth deferring — say we’ll mention it and move on.
- Cross-feed features. Re-shares, reply threads, quote posts? These change the data model and the fan-out shape.
Say the interviewer confirms: 500M DAU, average user posts 2/day (1B posts/day), average follow count ~300, max ~100M for celebrities, relevance-ranked feed, freshness tolerance ~30 seconds, text + image, support re-shares.
2. Capacity estimate
- Posts: 1B/day ≈ 12K writes/sec average, ~50K/sec peak
- Feed reads: each DAU loads their feed ~10× per day ≈ 5B reads/day ≈ 60K reads/sec average, ~300K/sec peak. Already ~5:1 read/write, and that’s before we count “refreshes” and scroll-driven pagination.
- Storage: 1B posts/day × 365 days × ~500 bytes (text + metadata, excluding media) ≈ 180 TB/year just for post content. Media goes to object storage separately.
- Fan-out: if we push, 1B posts × 300 avg followers = 300B writes/day to inboxes. That’s ~3.5M writes/sec. This is the number that decides the architecture.
I’d say out loud: “That 3.5M writes/sec on pure push is the single biggest design constraint. The whole fan-out question is really about whether we can avoid paying that cost for everyone.”
3. API design
Three endpoints do most of the work:
POST /api/posts
body: { author_id, text, media_ids?, parent_post_id? }
returns: { post_id, created_at }
GET /api/feed?cursor=<opaque>&limit=20
returns: { posts: [...], next_cursor }
POST /api/follow
body: { follower_id, followee_id }
returns: { ok }
Non-obvious decision: cursor-based pagination, not offset-based. Offset-based
pagination breaks when new posts arrive between pages — the user sees duplicates or
misses posts. Cursors (opaque, typically (score, post_id) encoded) are stable against
inserts. Worth saying out loud; interviewers notice.
4. Core design choice: fan-out strategy
This is the question. Three options:
(a) Fan-out on write (push). When a user posts, write the post_id into the inbox of every follower. Each user’s feed is then a trivial read: scan your own inbox, hydrate post content, return.
- Pros: Feed reads are cheap — one partition, one scan.
- Cons: Write amplification. For a celebrity with 100M followers, one post = 100M inbox writes. The 3.5M writes/sec estimate from Section 2 is the floor on average users; celebrities make it worse.
(b) Fan-out on read (pull). Keep posts in each author’s outbox. On feed read, fetch each followee’s recent outbox, merge, rank, return.
- Pros: Writes are O(1). No write amplification.
- Cons: Read amplification. For a user who follows 300 people, every feed load = 300 outbox reads. At 300K feed reads/sec × 300 followees = 90M outbox reads/sec. This is a read storm.
(c) Hybrid. Push for ordinary users; pull for celebrities. A user’s feed is
(pushed inbox) ∪ (pulled celebrity outboxes) — merged and ranked at read time.
I’d commit to (c) hybrid, with a follower-count threshold. The justification is topic-specific: the follower-count distribution in social networks is power-law — a tiny fraction of users account for most of the fan-out cost. Push the 99.9%, pull the 0.1%. This is exactly the architecture Twitter described for their home timeline, and the reasoning generalizes: in any system where the cost distribution is power-law, the fix is a per-entity policy, not a one-size-fits-all protocol.
The threshold is tunable (say, 1M followers). Users above threshold don’t fan out on write; their posts are pulled at read time and merged into followers’ feeds. Below threshold, standard push.
This is where SDE II vs III diverges most. SDE II might commit to push or pull. SDE III names the asymmetry in the follower distribution, reaches for hybrid, and names the threshold as a tunable operational parameter.
5. Data model and storage
Three core stores:
posts (authoritative post content):
post_id (PK)
author_id
text
media_ids
parent_post_id
created_at
inbox (per-user, pushed posts):
user_id (partition key)
score (sort key — ranking signal or timestamp)
post_id
author_id
follows (social graph):
follower_id
followee_id
created_at
Storage choices:
posts— immutable after creation (edits are rare and a separate problem), keyed bypost_id. A wide-column store like Cassandra is a natural fit: partition bypost_id, cheap to write, cheap to point-lookup, no joins needed.inbox— the critical structure. Per-user partition, sorted by score. Writes are small and frequent. Reads are range scans (“top 20 for this user”). Either Cassandra with a clustering key onscore, or Redis sorted sets if we want the hot working set in memory (more on this below).follows— the graph. Sharded MySQL, DynamoDB, or a purpose-built graph store. The access pattern is “who does user X follow” and “who follows user X” — both are O(follower_count) lookups. A regular sharded DB is fine; a graph database only pays off if we need 2+ hop queries, which a timeline doesn’t.
Topic-specific justification worth saying out loud: “The inbox is essentially a per-user priority queue. That’s exactly what Redis sorted sets are built for, and it’s why we’ll see them in the read path. We’re not caching general data — we’re using the data structure that natively matches the access pattern.”
6. The read path
At 300K reads/sec peak, we cache aggressively. The flow:
flowchart LR
Client[Client] --> API[API gateway]
API --> Feed[Feed service]
Feed --> Inbox[(Redis inbox · sorted set)]
Inbox -->|miss or cold| InboxDB[(Cassandra inbox)]
Feed --> Posts[(Post service)]
Posts --> PostCache[(Redis post cache)]
PostCache -->|miss| PostsDB[(Cassandra posts)]
Two layers of caching, each doing a different job:
- Inbox cache (Redis sorted set per active user). Holds the top N post_ids for each user. TTL-based: warm on read, evict when cold. Hot-user coverage is high because active-user access is itself power-law — the top 10% of users drive the bulk of feed reads.
- Post content cache (Redis hash by post_id). Holds the hydrated post body. Once a post is cached, reads from any viewer are served from memory.
Cache invalidation is where this differs from simpler systems. Posts are effectively immutable (edits are rare, deletes even rarer) — so the post cache needs almost no invalidation; a long TTL is fine. The inbox is different: a new push must write through to the cache so the next feed read sees it. The write flow (next section) handles this.
Topic-specific justification: “Caching posts is cheap because they don’t change. Caching inboxes works because the active-user set is bounded and power-law — we don’t need to cache every user’s inbox, just the top few percent who generate most reads.”
7. Fan-out on write: extending the architecture
Now the write path. A post lands, we push to followers’ inboxes — but only for non-celebrity authors.
Extending the architecture:
flowchart LR
Client[Client] --> API[API gateway]
API --> Feed[Feed service]
Feed --> Inbox[(Redis inbox · sorted set)]
Inbox -->|miss or cold| InboxDB[(Cassandra inbox)]
Feed --> Posts[(Post service)]
Posts --> PostCache[(Redis post cache)]
PostCache -->|miss| PostsDB[(Cassandra posts)]
Author[Author] --> API
API --> PostService[Post service]
PostService --> PostsDB
PostService -->|emit post event| Kafka[(Kafka · posts topic)]
Kafka --> Fanout[Fan-out workers]
Fanout --> Follows[(Follows DB)]
Fanout --> InboxDB
Fanout --> Inbox
classDef new fill:#eef2f1,stroke:#2c5f5d,stroke-width:2px,color:#1f2937;
class Author,PostService,Kafka,Fanout,Follows new
The write path step by step:
- Author posts via the API.
- Post service writes to
posts(authoritative) and emits a post event to Kafka. - Fan-out workers consume the event. They check the author’s follower count in the follows DB.
- If below threshold (e.g., < 1M followers): fetch the follower list, batch-write
post_idinto each follower’sinbox(both Redis and Cassandra). - If above threshold: skip fan-out entirely. The post stays in
posts; it will be pulled at read time by any follower viewing their feed.
Why async via Kafka? Three reasons, in order:
- Decouples write latency from fan-out work. The author’s
POST /postsreturns as soon as the authoritative write lands — not after millions of inbox writes. - Absorbs celebrity bursts. When a celebrity posts to 50M followers (those still within the threshold, or for a lower-threshold scheme), the queue smooths the load so the inbox DB isn’t overwhelmed in a spike.
- Enables replay. If a fan-out worker bug drops events, we replay from Kafka. If we add a new index (e.g., “notifications” alongside “inbox”), we replay to populate it.
Topic-specific justification: “We pay the write-amplification cost only for users where it’s cheap — ordinary users with small follower counts. For the expensive 0.1%, the cost shifts to read time, where it’s amortized across followers who actually load their feeds.”
8. The celebrity problem: pull at read time
For users above the threshold, their posts aren’t pushed. On feed read, the feed service fetches recent posts from each celebrity the user follows — their outbox — and merges with the pushed inbox.
The “outbox” is just a per-author index into posts, sorted by time. Cassandra
again: partition by author_id, cluster by created_at. Cheap to read the last N.
flowchart LR
Client[Client] --> API[API gateway]
API --> Feed[Feed service]
Feed --> Inbox[(Redis inbox · sorted set)]
Inbox -->|miss or cold| InboxDB[(Cassandra inbox)]
Feed --> Posts[(Post service)]
Posts --> PostCache[(Redis post cache)]
PostCache -->|miss| PostsDB[(Cassandra posts)]
Author[Author] --> API
API --> PostService[Post service]
PostService --> PostsDB
PostService -->|emit post event| Kafka[(Kafka · posts topic)]
Kafka --> Fanout[Fan-out workers]
Fanout --> Follows[(Follows DB)]
Fanout --> InboxDB
Fanout --> Inbox
Feed --> CelebOutbox[(Celebrity outbox · per author)]
Feed --> Ranker[Ranker]
Ranker --> Feed
classDef new fill:#eef2f1,stroke:#2c5f5d,stroke-width:2px,color:#1f2937;
class CelebOutbox,Ranker new
Feed read, step by step:
- Feed service fetches top N from the user’s pushed inbox (Redis).
- Feed service fetches recent posts from every celebrity the user follows (typically a small number — users follow few celebrities compared to total follows).
- Merge: deduplicate, combine scores.
- Rank: the ranker scores each post using signals (recency, author-engagement, viewer-author interaction strength, content features). For an interview, I’d name the signals and gesture at a learned model — a full ML pipeline is out of scope.
- Return top K with a cursor.
Topic-specific justification: “Pulling on read for celebrities works because the number of celebrities a user follows is small — often single digits. So we pay a bounded read cost per feed load, not the 300-outbox-read disaster of a pure pull design.”
9. Sharding
posts— partitioned bypost_id. Random, no hot partitions (post_ids are generated via Snowflake or similar, so they spread evenly).inbox— partitioned byuser_id. Natural choice — all reads are “my inbox”, all writes go to a specific user’s inbox. Hot-partition risk is real for celebrity-accounts’ inboxes (when a celebrity follows 10K accounts themselves and generates a lot of inbox writes), but manageable.follows— partitioned byfollower_idfor the “who do I follow” query, with a secondary index (or denormalized twin table) byfollowee_idfor “who follows me” — needed by the fan-out workers. Denormalization is cheaper than a cross-partition scan.- Redis cluster — consistent-hashed on
user_idfor inbox,post_idfor post cache. Standard.
Multi-region isn’t essential for an interview answer here, but worth naming: users typically read from their home region; async cross-region replication handles the edge case of a user traveling.
10. Failure modes
Four to walk through — name what fails, name what degrades gracefully, name what doesn’t.
- Fan-out workers fall behind. Kafka absorbs the backlog. Followers see delayed posts but no data loss. The feed is stale, not wrong. Mitigation: scale workers horizontally; Kafka consumer groups handle rebalancing. If the backlog keeps growing, we have a real capacity problem — page someone.
- Redis inbox cluster loses a node. Thundering herd of cold reads onto Cassandra. Mitigations: request coalescing (one miss per user triggers one DB read), circuit breakers — see Martin Fowler on the pattern — and pre-warming from a secondary tier. The feed degrades to slower reads, not broken reads.
- Follows DB partition unavailable. Fan-out workers for affected followers stall. Users in other partitions are unaffected. New follows into the affected partition fail; existing reads that hit the partition fail. This is the hardest failure mode because it’s a correctness-adjacent problem — if we can’t fetch the follower list, we can’t fan out correctly when it comes back. Mitigation: follows should be in a multi-master or quorum-replicated store.
- Ranker service is down. Fall back to chronological ordering. The feed is still usable, just less good. This is the cleanest graceful degradation in the system, and worth calling out — it’s a signal that we’ve thought about ranking as a dependency, not a hard requirement.
11. What I’d skip, and say I’m skipping
Time check — five minutes left. Things I’d explicitly defer:
- The ranking model itself. A real feed’s ranker is a multi-stage ML system (candidate generation, feature hydration, scoring, diversification, policy filters). Worth mentioning as a black box; not worth sketching the whole architecture. This is a system that contains a second system.
- Media pipeline. Images and video are stored in object storage behind a CDN, with transcoding for different resolutions. Say “CDN + S3” and move on.
- Notifications. A peer system to the feed, sharing the fan-out infrastructure. Worth a sentence; not a section.
- Spam and integrity. Content moderation, rate limits on posting, fake accounts. Real systems need this; interviews don’t require you to solve it.
- Direct messaging / replies as threads. Different access pattern — reply threads are more like a chat system than a feed. Out of scope for “design the home timeline”.
Saying “I’d skip this, and here’s why” is a strong SDE III signal. It shows you know the full surface and are making deliberate scoping choices.
12. Wrap-up
One crisp sentence before the interviewer’s next question:
This design optimizes feed reads by precomputing most users’ inboxes, and handles the celebrity fan-out problem by inverting the cost to read time for the 0.1% of users where write-side fan-out doesn’t scale. It trades strict freshness for bounded cost.
What separates SDE II from SDE III on this question
- SDE II usually lands on push or pull, describes one fan-out strategy, names the inbox/outbox structure, and sketches a basic read path.
- SDE III names the power-law follower distribution as the reason for hybrid, picks a threshold with a specific rationale, treats the async fan-out pipeline as a peer system (not an implementation detail), names 4+ failure modes with graceful degradation paths, and explicitly defers ranking and media as “second systems” that don’t belong in this answer.
The difference isn’t knowledge. It’s naming the asymmetry — that the cost distribution is non-uniform, and the design should be non-uniform to match.
Further reading
- Twitter’s Timeline at Scale (InfoQ) — the definitive talk on hybrid fan-out, the celebrity problem, and the operational reality of running a large-scale timeline. The clearest primary source on this question.
- Designing Data-Intensive Applications — Kleppmann. Chapters 1, 5, and 11 cover the async-messaging, replication, and stream-processing patterns that this design leans on.
- System Design Interview Vol. 1 — Alex Xu. Chapter 11 is the textbook walkthrough for news feed.
- Twitter Snowflake —
the ID-generation scheme that makes
post_idpartitioning clean. - Power law distribution — the statistical shape that justifies the hybrid strategy.
- Cache-aside pattern — the pattern both cache layers use.