Reference: SQL vs NoSQL Schema Design

What it is

SQL databases model data as relations (tables with typed columns and enforced relationships). NoSQL is a catch-all for everything else — key-value, document, wide-column, graph. The design question isn’t “which is better” but “which access pattern does this data have, and which store fits it.”

When you care

Every system design question picks a database. Candidates who pick “Postgres” or “Cassandra” without naming the access pattern that drives the choice lose signal. The interviewer wants to hear the reasoning: what reads, what writes, what scale, what consistency.

When to pick what

Data shape	Pick	Why
Relational, ACID-critical (orders, payments, inventory)	SQL (Postgres, MySQL)	Transactions, joins, mature tooling.
High-cardinality KV lookup (URL shortener, session store, feature flags)	KV (DynamoDB, Redis, Cassandra)	Point lookups at scale; no joins needed.
Wide rows, append-heavy (events, time-series, clicks)	Wide-column (Cassandra, Bigtable)	Efficient range scans within a partition.
Document-shaped, varied fields (catalogs, user profiles with flexible schemas)	Document (MongoDB, DynamoDB)	Nested structure without schema rigidity.
Graph traversal (social, recommendations, fraud)	Graph (Neo4j, JanusGraph)	Efficient multi-hop traversal.
Analytics, aggregations (dashboards, BI)	OLAP / columnar (ClickHouse, Druid, Snowflake)	Column-store compression and group-by speed.
Full-text search	Search (Elasticsearch, OpenSearch)	Inverted index, relevance scoring, faceting.

The honest rule: default to Postgres until an access pattern specifically disqualifies it. SQL databases handle more than most interviewees credit them with — JSONB columns, partial indexes, logical replication, and read replicas cover 80% of cases often attributed to NoSQL.

SQL schema design

Normalize first, denormalize with reason. Start at third normal form. Denormalize only when a specific read pattern’s latency justifies the duplicated data and the maintenance cost.
Primary keys are not optional. Every table has one. Use integer auto-increment or UUID based on distribution needs.
Indexes follow the query. Add an index when a query is slow, not speculatively. Each index costs write throughput.
Foreign keys enforce invariants. Use them unless a specific scaling constraint demands otherwise — then document why.
JSONB for flexible fields. Postgres JSONB plus GIN indexes covers many “we need flexible schema” cases without leaving SQL.

NoSQL schema design

The central rule: design the schema around the queries, not the data. NoSQL systems don’t rescue you from bad access patterns; they fail catastrophically when keys and queries aren’t aligned.

Key design

Most NoSQL systems use a compound key with two parts:

Partition key — determines which node stores the data. The hash space is sharded across nodes; the partition key decides which shard.
Sort key (clustering key) — orders rows within a partition, enables range queries.

Example (Cassandra / DynamoDB):

Table: clicks
  Partition key: ad_id
  Sort key: timestamp
  Other columns: user_id, country, device, ...

Queries like “all clicks for ad X between time T1 and T2” become fast: one partition, sorted by time, ranged-read. Queries like “all clicks from country Y” require a secondary index or a full scan — usually the wrong access pattern for this schema.

The hot-key problem

A partition key concentrates traffic for one logical entity on one node. If one entity dominates traffic, one node melts while the others idle.

Example: the ad_id partition key in the table above. During the Super Bowl, one ad might get 10% of total traffic, sending 10% of your writes to one Cassandra node.

Mitigations:

Technique	Mechanism	Tradeoff
Key salting	Partition by `(ad_id, hash(time) % N)`.	Spreads writes across N sub-partitions. Queries must scan all N.
Write sharding	Client picks one of N partition suffixes per write.	Same as salting, client-driven.
Time bucketing	Partition by `(ad_id, day)` so hot keys rotate.	Queries confined to one day per partition.
Read-side caching	Cache hot keys at a Redis tier in front.	Doesn’t help writes; helps reads.
Tiered reads	Read replicas or CDN for read-heavy hot keys.	Same — read-side only.

Key salting is the most common answer at interview scale. The merge cost on the read side (scanning N sub-partitions and combining) is usually acceptable; the win on the write side is enormous.

Secondary indexes and denormalization

NoSQL secondary indexes exist but have significant caveats:

Cassandra secondary indexes scan all partitions and don’t scale. Use materialized views or a separate table keyed differently instead.
DynamoDB GSIs (global secondary indexes) are separate tables that replicate asynchronously. Eventual consistency, extra cost.
The general rule: if you need to query by a non-partition-key field, maintain a separate table keyed by that field. Write to both. This is denormalization, and it’s the correct pattern in NoSQL.

Consistency

Most NoSQL systems are tunable: you pick the consistency level per operation. Dynamo-style systems use quorum configuration (R + W > N for strong consistency where R = read replicas contacted, W = write replicas required, N = total replicas). The PACELC framing — during partition, Availability or Consistency; else, Latency or Consistency — is the frame most NoSQL stores fit into.

When to pick what

Default for transactional data: SQL (Postgres).
Default for high-volume KV lookup: DynamoDB or Cassandra.
Event logs, clicks, time-series writes: Cassandra, with partition-key design informed by the dominant read pattern.
Search and faceting: Elasticsearch alongside your primary store.
Analytics and dashboards: a columnar OLAP store (ClickHouse, Druid, Snowflake) fed from the primary store.
In interviews: name the partition key and the query it serves. Name one hot-key risk and one mitigation. That’s the full signal.

Walkthrough: Designing a URL Shortener — KV schema design at scale; partition by short_code.
Walkthrough: Designing an Ad Click Aggregation System — wide-column schema with explicit hot-key salting.
Walkthrough: Designing a RAG System — three-store design (relational + vector + inverted index) for one product.

Tags #system-design#concepts#databases#sql#nosql#reference

Reference: SQL vs NoSQL Schema Design

What it is

When you care

When to pick what

SQL schema design

NoSQL schema design

Key design

The hot-key problem

Secondary indexes and denormalization

Consistency

When to pick what

Related