Skip to main content

Reference: SQL vs NoSQL Schema Design

A short reference on when to pick SQL vs NoSQL, how to model schemas in each, and how to avoid hot keys.

What it is

SQL databases model data as relations (tables with typed columns and enforced relationships). NoSQL is a catch-all for everything else — key-value, document, wide-column, graph. The design question isn’t “which is better” but “which access pattern does this data have, and which store fits it.”

When you care

Every system design question picks a database. Candidates who pick “Postgres” or “Cassandra” without naming the access pattern that drives the choice lose signal. The interviewer wants to hear the reasoning: what reads, what writes, what scale, what consistency.

When to pick what

Data shapePickWhy
Relational, ACID-critical (orders, payments, inventory)SQL (Postgres, MySQL)Transactions, joins, mature tooling.
High-cardinality KV lookup (URL shortener, session store, feature flags)KV (DynamoDB, Redis, Cassandra)Point lookups at scale; no joins needed.
Wide rows, append-heavy (events, time-series, clicks)Wide-column (Cassandra, Bigtable)Efficient range scans within a partition.
Document-shaped, varied fields (catalogs, user profiles with flexible schemas)Document (MongoDB, DynamoDB)Nested structure without schema rigidity.
Graph traversal (social, recommendations, fraud)Graph (Neo4j, JanusGraph)Efficient multi-hop traversal.
Analytics, aggregations (dashboards, BI)OLAP / columnar (ClickHouse, Druid, Snowflake)Column-store compression and group-by speed.
Full-text searchSearch (Elasticsearch, OpenSearch)Inverted index, relevance scoring, faceting.

The honest rule: default to Postgres until an access pattern specifically disqualifies it. SQL databases handle more than most interviewees credit them with — JSONB columns, partial indexes, logical replication, and read replicas cover 80% of cases often attributed to NoSQL.

SQL schema design

  • Normalize first, denormalize with reason. Start at third normal form. Denormalize only when a specific read pattern’s latency justifies the duplicated data and the maintenance cost.
  • Primary keys are not optional. Every table has one. Use integer auto-increment or UUID based on distribution needs.
  • Indexes follow the query. Add an index when a query is slow, not speculatively. Each index costs write throughput.
  • Foreign keys enforce invariants. Use them unless a specific scaling constraint demands otherwise — then document why.
  • JSONB for flexible fields. Postgres JSONB plus GIN indexes covers many “we need flexible schema” cases without leaving SQL.

NoSQL schema design

The central rule: design the schema around the queries, not the data. NoSQL systems don’t rescue you from bad access patterns; they fail catastrophically when keys and queries aren’t aligned.

Key design

Most NoSQL systems use a compound key with two parts:

  • Partition key — determines which node stores the data. The hash space is sharded across nodes; the partition key decides which shard.
  • Sort key (clustering key) — orders rows within a partition, enables range queries.

Example (Cassandra / DynamoDB):

Table: clicks
  Partition key: ad_id
  Sort key: timestamp
  Other columns: user_id, country, device, ...

Queries like “all clicks for ad X between time T1 and T2” become fast: one partition, sorted by time, ranged-read. Queries like “all clicks from country Y” require a secondary index or a full scan — usually the wrong access pattern for this schema.

The hot-key problem

A partition key concentrates traffic for one logical entity on one node. If one entity dominates traffic, one node melts while the others idle.

Example: the ad_id partition key in the table above. During the Super Bowl, one ad might get 10% of total traffic, sending 10% of your writes to one Cassandra node.

Mitigations:

TechniqueMechanismTradeoff
Key saltingPartition by (ad_id, hash(time) % N).Spreads writes across N sub-partitions. Queries must scan all N.
Write shardingClient picks one of N partition suffixes per write.Same as salting, client-driven.
Time bucketingPartition by (ad_id, day) so hot keys rotate.Queries confined to one day per partition.
Read-side cachingCache hot keys at a Redis tier in front.Doesn’t help writes; helps reads.
Tiered readsRead replicas or CDN for read-heavy hot keys.Same — read-side only.

Key salting is the most common answer at interview scale. The merge cost on the read side (scanning N sub-partitions and combining) is usually acceptable; the win on the write side is enormous.

Secondary indexes and denormalization

NoSQL secondary indexes exist but have significant caveats:

  • Cassandra secondary indexes scan all partitions and don’t scale. Use materialized views or a separate table keyed differently instead.
  • DynamoDB GSIs (global secondary indexes) are separate tables that replicate asynchronously. Eventual consistency, extra cost.
  • The general rule: if you need to query by a non-partition-key field, maintain a separate table keyed by that field. Write to both. This is denormalization, and it’s the correct pattern in NoSQL.

Consistency

Most NoSQL systems are tunable: you pick the consistency level per operation. Dynamo-style systems use quorum configuration (R + W > N for strong consistency where R = read replicas contacted, W = write replicas required, N = total replicas). The PACELC framing — during partition, Availability or Consistency; else, Latency or Consistency — is the frame most NoSQL stores fit into.

When to pick what

  • Default for transactional data: SQL (Postgres).
  • Default for high-volume KV lookup: DynamoDB or Cassandra.
  • Event logs, clicks, time-series writes: Cassandra, with partition-key design informed by the dominant read pattern.
  • Search and faceting: Elasticsearch alongside your primary store.
  • Analytics and dashboards: a columnar OLAP store (ClickHouse, Druid, Snowflake) fed from the primary store.
  • In interviews: name the partition key and the query it serves. Name one hot-key risk and one mitigation. That’s the full signal.