What it is
SQL databases model data as relations (tables with typed columns and enforced relationships). NoSQL is a catch-all for everything else — key-value, document, wide-column, graph. The design question isn’t “which is better” but “which access pattern does this data have, and which store fits it.”
When you care
Every system design question picks a database. Candidates who pick “Postgres” or “Cassandra” without naming the access pattern that drives the choice lose signal. The interviewer wants to hear the reasoning: what reads, what writes, what scale, what consistency.
When to pick what
| Data shape | Pick | Why |
|---|---|---|
| Relational, ACID-critical (orders, payments, inventory) | SQL (Postgres, MySQL) | Transactions, joins, mature tooling. |
| High-cardinality KV lookup (URL shortener, session store, feature flags) | KV (DynamoDB, Redis, Cassandra) | Point lookups at scale; no joins needed. |
| Wide rows, append-heavy (events, time-series, clicks) | Wide-column (Cassandra, Bigtable) | Efficient range scans within a partition. |
| Document-shaped, varied fields (catalogs, user profiles with flexible schemas) | Document (MongoDB, DynamoDB) | Nested structure without schema rigidity. |
| Graph traversal (social, recommendations, fraud) | Graph (Neo4j, JanusGraph) | Efficient multi-hop traversal. |
| Analytics, aggregations (dashboards, BI) | OLAP / columnar (ClickHouse, Druid, Snowflake) | Column-store compression and group-by speed. |
| Full-text search | Search (Elasticsearch, OpenSearch) | Inverted index, relevance scoring, faceting. |
The honest rule: default to Postgres until an access pattern specifically disqualifies it. SQL databases handle more than most interviewees credit them with — JSONB columns, partial indexes, logical replication, and read replicas cover 80% of cases often attributed to NoSQL.
SQL schema design
- Normalize first, denormalize with reason. Start at third normal form. Denormalize only when a specific read pattern’s latency justifies the duplicated data and the maintenance cost.
- Primary keys are not optional. Every table has one. Use integer auto-increment or UUID based on distribution needs.
- Indexes follow the query. Add an index when a query is slow, not speculatively. Each index costs write throughput.
- Foreign keys enforce invariants. Use them unless a specific scaling constraint demands otherwise — then document why.
- JSONB for flexible fields. Postgres JSONB plus GIN indexes covers many “we need flexible schema” cases without leaving SQL.
NoSQL schema design
The central rule: design the schema around the queries, not the data. NoSQL systems don’t rescue you from bad access patterns; they fail catastrophically when keys and queries aren’t aligned.
Key design
Most NoSQL systems use a compound key with two parts:
- Partition key — determines which node stores the data. The hash space is sharded across nodes; the partition key decides which shard.
- Sort key (clustering key) — orders rows within a partition, enables range queries.
Example (Cassandra / DynamoDB):
Table: clicks
Partition key: ad_id
Sort key: timestamp
Other columns: user_id, country, device, ...
Queries like “all clicks for ad X between time T1 and T2” become fast: one partition, sorted by time, ranged-read. Queries like “all clicks from country Y” require a secondary index or a full scan — usually the wrong access pattern for this schema.
The hot-key problem
A partition key concentrates traffic for one logical entity on one node. If one entity dominates traffic, one node melts while the others idle.
Example: the ad_id partition key in the table above. During the Super
Bowl, one ad might get 10% of total traffic, sending 10% of your writes
to one Cassandra node.
Mitigations:
| Technique | Mechanism | Tradeoff |
|---|---|---|
| Key salting | Partition by (ad_id, hash(time) % N). | Spreads writes across N sub-partitions. Queries must scan all N. |
| Write sharding | Client picks one of N partition suffixes per write. | Same as salting, client-driven. |
| Time bucketing | Partition by (ad_id, day) so hot keys rotate. | Queries confined to one day per partition. |
| Read-side caching | Cache hot keys at a Redis tier in front. | Doesn’t help writes; helps reads. |
| Tiered reads | Read replicas or CDN for read-heavy hot keys. | Same — read-side only. |
Key salting is the most common answer at interview scale. The merge cost on the read side (scanning N sub-partitions and combining) is usually acceptable; the win on the write side is enormous.
Secondary indexes and denormalization
NoSQL secondary indexes exist but have significant caveats:
- Cassandra secondary indexes scan all partitions and don’t scale. Use materialized views or a separate table keyed differently instead.
- DynamoDB GSIs (global secondary indexes) are separate tables that replicate asynchronously. Eventual consistency, extra cost.
- The general rule: if you need to query by a non-partition-key field, maintain a separate table keyed by that field. Write to both. This is denormalization, and it’s the correct pattern in NoSQL.
Consistency
Most NoSQL systems are tunable: you pick the consistency level per
operation. Dynamo-style systems use quorum configuration (R + W > N
for strong consistency where R = read replicas contacted, W = write
replicas required, N = total replicas). The PACELC framing — during
partition, Availability or Consistency; else, Latency or Consistency —
is the frame most NoSQL stores fit into.
When to pick what
- Default for transactional data: SQL (Postgres).
- Default for high-volume KV lookup: DynamoDB or Cassandra.
- Event logs, clicks, time-series writes: Cassandra, with partition-key design informed by the dominant read pattern.
- Search and faceting: Elasticsearch alongside your primary store.
- Analytics and dashboards: a columnar OLAP store (ClickHouse, Druid, Snowflake) fed from the primary store.
- In interviews: name the partition key and the query it serves. Name one hot-key risk and one mitigation. That’s the full signal.
Related
- Walkthrough: Designing a URL Shortener — KV schema design at scale; partition by
short_code. - Walkthrough: Designing an Ad Click Aggregation System — wide-column schema with explicit hot-key salting.
- Walkthrough: Designing a RAG System — three-store design (relational + vector + inverted index) for one product.