iTranslated by AI
System Design 06: The CAP Theorem - Understanding the Fundamental Trade-Off
TL;DR
- The CAP theorem states that a distributed system can only guarantee 2 out of 3: Consistency, Availability, and Partition Tolerance
- Since network partitions are inevitable, the real choice is between CP (consistency over availability) and AP (availability over consistency)
- Most modern systems don't pick one extreme — they tune consistency on a per-operation basis (e.g., DynamoDB offers both eventual and strong consistency reads)
The CAP theorem is the fundamental constraint of distributed systems. Since network failures are unavoidable, the real choice is between prioritizing consistency or availability. Real-world systems handle these trade-offs on a per-operation basis.
Introduction
The CAP theorem is the most important theoretical foundation for understanding distributed systems.
In the previous articles, we've discussed replication — copying data across multiple servers. But what happens when those servers can't communicate with each other?
This is where the CAP theorem comes in. It defines the fundamental trade-off that every distributed database must make. Understanding CAP is essential because it explains why databases like DynamoDB, Cassandra, and PostgreSQL behave differently under failure conditions.
In this article, we'll cover:
- What the CAP theorem actually says (and what it doesn't)
- How real databases position themselves on the CAP spectrum
- How DynamoDB lets you choose consistency on a per-read basis
Core Concepts
The Three Properties
What C, A, and P actually mean.
Consistency (C)
Every read receives the most recent write or an error. All nodes see the same data at the same time.
Example: You update your username to "Bob." If the system is consistent, every subsequent read from any node returns "Bob" — never the old value "Alice."
Availability (A)
Every request receives a non-error response, even if some nodes are down. The system always responds.
Example: Even if one database server crashes, the system still returns data. It might be slightly stale, but it responds — no errors, no timeouts.
Partition Tolerance (P)
The system continues to operate even when network communication between nodes is lost. Messages between servers can be dropped or delayed.
Example: The network cable between your US data center and your EU data center is cut. Both data centers keep running independently.
Why "Pick 2" Is Misleading
The phrase "pick 2 out of 3" can be misleading.
The CAP theorem is often presented as "pick 2 out of 3." But this is misleading. Here's why:
Network partitions are not optional. In any distributed system, network failures will happen — switches fail, cables get cut, cloud availability zones lose connectivity. You can't choose to not have partitions.
Since P (Partition Tolerance) is mandatory, the real choice is:
- CP: During a partition, maintain consistency by refusing to serve requests (some requests fail)
- AP: During a partition, maintain availability by serving potentially stale data (all requests succeed, but data might be inconsistent)
CP vs AP: What Happens During a Partition?
How CP and AP differ when a network failure occurs.
Imagine a database with 2 nodes: Node A in the US, Node B in Europe. A network partition occurs — they can't communicate.
CP system (e.g., traditional PostgreSQL with synchronous replication):
A user in Europe tries to write data. Node B can't confirm the write with Node A, so it rejects the write. The system is unavailable for writes, but when the partition heals, data is guaranteed to be consistent.
AP system (e.g., Cassandra with eventual consistency):
A user in Europe writes data to Node B. Node B accepts the write locally without confirming with Node A. Meanwhile, Node A also accepts writes. When the partition heals, the system must reconcile potentially conflicting writes.
| Scenario | CP System | AP System |
|---|---|---|
| During partition | Some requests fail | All requests succeed |
| Data consistency | Guaranteed | May have conflicts |
| After partition heals | No conflicts to resolve | Must reconcile conflicts |
| Best for | Banking, inventory | Social media, shopping carts |
Where Real Databases Sit
Where do real-world databases fall on the CAP spectrum?
No database is purely CP or purely AP. Most offer configurable consistency:
| Database | Default Position | Notes |
|---|---|---|
| PostgreSQL | CP | Strong consistency, single-leader replication |
| MySQL | CP | Similar to PostgreSQL |
| MongoDB | CP | Writes go to primary, reads can be from secondaries |
| Cassandra | AP | Eventual consistency by default, tunable per query |
| DynamoDB | AP | Eventual consistency by default, strong consistency available per read |
| CockroachDB | CP | Distributed SQL with strong consistency |
Beyond CAP: PACELC
An extension of the CAP theorem. Consider trade-offs not just during failures, but during normal operation as well.
CAP only describes behavior during a partition. But what about normal operation? The PACELC theorem extends CAP:
If there is a Partition, choose between Availability and Consistency.
Else (normal operation), choose between Latency and Consistency.
This captures an important reality: even when everything is working fine, there's a trade-off between how fast you respond and how consistent the data is.
- Strong consistency requires coordination between nodes → higher latency
- Eventual consistency responds from the nearest node → lower latency
| System | During Partition (PAC) | Normal Operation (ELC) |
|---|---|---|
| DynamoDB | PA (available) | EL (low latency) |
| Cassandra | PA (available) | EL (low latency) |
| PostgreSQL | PC (consistent) | EC (consistent) |
| CockroachDB | PC (consistent) | EC (consistent) |
Real-World Case Study: DynamoDB's Tunable Consistency
DynamoDB is designed to give developers choices within the constraints of the CAP theorem.
Amazon DynamoDB is one of the best examples of how modern databases handle the CAP trade-off pragmatically — by letting developers choose per operation.
The Design
DynamoDB replicates data across 3 availability zones in a region. Every write goes to a quorum (at least 2 out of 3 nodes must confirm).
For reads, DynamoDB offers two modes:
Eventually Consistent Read (default):
- Reads from any replica — might return slightly stale data
- Lower latency, higher throughput
- Inconsistency window is typically under 1 second
Strongly Consistent Read (opt-in):
- Reads from the leader node that has the latest write
- Guaranteed to return the most recent data
- Higher latency, uses more read capacity
Why This Matters
This design lets you choose the right trade-off per API call:
// Shopping cart - eventual consistency is fine
getItem({ ConsistentRead: false }) // fast, might be slightly stale
// Payment processing - need strong consistency
getItem({ ConsistentRead: true }) // slower, but guaranteed current
You don't have to choose CP or AP for your entire system. You choose per operation based on what that specific data needs.
The Lesson
The CAP theorem isn't about choosing one camp forever. Modern databases give you a spectrum of consistency options. The skill is knowing which level of consistency each part of your system actually needs — and not paying for stronger consistency than necessary.
Sources: Understanding Eventual Consistency in DynamoDB (Alex DeBrie), AWS Whitepaper: CAP Theorem
Practice Questions
Q1: Fundamentals
Fundamentals: Judging CP vs AP.
Question:
You're building two systems: (1) a bank's transfer service where users send money to each other, and (2) a social media "like" counter that shows how many likes a post has. For each system, would you choose CP or AP? Why?
Hints
- What happens if the bank transfer system shows inconsistent balances?
- What happens if the like counter is off by a few for a few seconds?
- Think about the cost of inconsistency for each system
Answer
Bank transfer service → CP
If User A has $100 and sends $100 to User B, and a network partition occurs:
- AP: Both nodes might accept the transfer, and User A could send $100 twice (total $200 sent from a $100 account). This is a real financial loss.
- CP: During the partition, the transfer is rejected. User A sees an error: "Transfer temporarily unavailable." Inconvenient, but no money is lost.
The cost of inconsistency (lost money) is far higher than the cost of temporary unavailability.
Like counter → AP
If a post has 10,000 likes and 5 new likes come in during a partition:
- AP: Different nodes might show 10,003 and 10,005. Both are "wrong" but close enough. Nobody notices.
- CP: Some users can't see the like count at all during the partition. This hurts user experience for no meaningful benefit.
The cost of inconsistency (off by a few likes) is negligible. The cost of unavailability (broken UI) is worse.
Key insight: The consistency requirement depends on the business cost of being wrong. Financial data → high cost → CP. Social engagement metrics → low cost → AP.
Q2: Applied
Applied: Designing for consistency levels.
Question:
You're building an e-commerce platform using DynamoDB. For each of the following operations, would you use eventually consistent reads or strongly consistent reads? Explain why.
- Displaying product recommendations on the homepage
- Checking inventory before confirming a purchase
- Loading a user's order history page
- Verifying a discount coupon hasn't been used already
Hints
- For each operation, ask: what's the worst case if the data is 1 second stale?
- Some operations have financial consequences, others don't
Answer
1. Product recommendations → Eventually consistent
- Stale by 1 second? Doesn't matter. Showing a slightly outdated recommendation list has zero business impact.
- Benefit: Lower latency, better user experience, lower cost.
2. Inventory check before purchase → Strongly consistent
- If inventory shows "5 available" but it's actually 0 (stale data), you'd sell a product you don't have → overselling, customer complaints, refund costs.
- The cost of being wrong is high. Pay the extra latency.
3. Order history → Eventually consistent
- If a user's most recent order shows up 1 second late on the history page, they won't notice.
- Exception: Right after placing an order, use read-after-write consistency (read from primary) to avoid "where's my order?" confusion.
4. Coupon verification → Strongly consistent
- If the coupon was just used by another user, but your read is stale, you'd accept a used coupon → financial loss.
- Same reasoning as inventory: financial consequence → strong consistency.
Summary:
| Operation | Consistency | Reason |
|---|---|---|
| Recommendations | Eventual | No business impact if stale |
| Inventory check | Strong | Overselling risk |
| Order history | Eventual | Low impact, optimize for speed |
| Coupon verification | Strong | Financial loss if stale |
Key insight: Default to eventual consistency (cheaper, faster) and only use strong consistency where stale data has real business consequences.
Q3: Deep Dive
Deep Dive: Specific behavior during a network partition.
Question:
Your application uses a database replicated across 2 data centers (US-East and US-West). A network partition occurs — the two data centers can't communicate. During the partition, a user in US-East updates their email from "old@example.com" to "new@example.com", and at the same time, a user in US-West reads that user's profile. After the partition heals, how do CP and AP systems handle this differently? What about the specific case where both data centers accept conflicting writes to the same record?
Hints
- In a CP system, which data center can accept writes during the partition?
- In an AP system, both accept writes — but what if they conflict?
- How do you resolve "US-East says email is X, US-West says email is Y"?
Answer
CP System Behavior:
Only one data center (the one with the primary/leader) accepts writes. If the primary is in US-East:
- US-East: Write succeeds (email updated to "new@example.com")
- US-West: Write attempts are rejected. Reads return stale data or fail entirely.
- After partition heals: US-West catches up. No conflicts to resolve.
Simple, but US-West users are impacted during the partition.
AP System Behavior:
Both data centers accept writes independently:
- US-East: Email updated to "new@example.com"
- US-West: A different write changes the email to "other@example.com"
- Both writes succeed locally. Conflict exists after partition heals.
Conflict Resolution Strategies:
1. Last-Write-Wins (LWW)
- Each write has a timestamp. The latest timestamp wins.
- Simple, but can lose data (the "losing" write is silently discarded)
- Used by: Cassandra, DynamoDB
2. Application-level resolution
- The database stores both conflicting versions
- The application decides how to merge them on the next read
- More complex, but no data loss
- Used by: CouchDB, Riak
3. CRDTs (Conflict-free Replicated Data Types)
- Data structures designed to be merged automatically without conflicts
- Example: a counter CRDT — both data centers increment, and the merge is simply the sum
- Limited to specific data types, but elegant when applicable
Key insight: Conflict resolution is the hidden cost of AP systems. "Eventual consistency" sounds simple, but how you eventually become consistent matters enormously. For simple data (counters, timestamps), LWW works. For complex data (user profiles), you may need application-level resolution.
Key Takeaways
Summary of key lessons.
-
CAP is really about CP vs AP — Partition tolerance isn't optional in distributed systems. The real question is: when the network fails, do you prioritize consistency or availability?
-
Most systems aren't purely CP or AP — Modern databases offer tunable consistency. Choose the right level per operation, not per system.
-
Think about the business cost of inconsistency — Bank balances need strong consistency. Like counters don't. Let the business impact guide your technical decision.
-
PACELC extends CAP — Even during normal operation, there's a latency vs consistency trade-off. Eventual consistency is faster. Strong consistency is safer. Default to eventual, upgrade to strong only where needed.
Discussion