HomeStackProjExpSysArtArch
Articles

CAP Theorem Explained: The Tradeoffs Behind Distributed Systems

0%
All Articles
System Design

CAP Theorem Explained:
The Tradeoffs Behind Distributed Systems

The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency, Availability, and Partition Tolerance. Understanding what this actually means — and what it doesn't — is fundamental to every architectural decision in distributed systems.

$Ibrahim YemiMay 24, 202616 min read
Distributed SystemsCAP TheoremConsistencyAvailabilityArchitecture

Every distributed system designer faces the same uncomfortable reality: you cannot build a distributed data store that is simultaneously perfectly consistent, always available, and tolerant of network failures. You must choose which guarantee to sacrifice, and that choice defines how your system behaves under the conditions that matter most in production.

The CAP theorem formalises this constraint. Stated formally by Eric Brewer in 2000 and later proved by Gilbert and Lynch in 2002, the theorem holds that in the presence of a network partition, a distributed system must choose between consistency and availability — it cannot have both.

The theorem sounds deceptively simple. In practice, it is one of the most misapplied concepts in distributed systems design. Engineers cite "we chose AP" or "we chose CP" without fully understanding what those labels mean, what they do not mean, and what tradeoffs actually follow from that choice in production code.


CAP is not an academic curiosity. It directly governs what guarantees you can offer to your users and what failure modes you must design for.

If you build a payment processing system on an AP database without understanding the consistency implications, you risk double charges during partitions. If you build a user-facing product feature on a CP database without understanding the availability implications, you risk returning errors to users during normal network hiccups that don't affect your primary database.

The wrong choice creates incidents. The right choice, made deliberately, gives you a system that degrades gracefully under exactly the conditions that would otherwise cause your worst outages.

Beyond CAP: PACELC

The PACELC theorem extends CAP by observing that even when the system is not partitioned, there is still a tradeoff between latency and consistency. A system that replicates writes synchronously to maintain consistency pays a latency cost on every write, even during normal operation. CAP alone doesn't capture this.


Consider a distributed inventory system for an e-commerce platform with nodes in three regions: US-East, EU-West, and AP-Southeast. Writes from any region update inventory counts globally.

A network partition occurs: US-East loses connectivity to EU-West for 45 seconds.

If the system chooses Consistency (CP):

  • EU-West nodes recognise they cannot confirm their data is current
  • They stop accepting reads and writes until connectivity is restored
  • Users in Europe see service errors or degraded mode for 45 seconds
  • When connectivity resumes, all nodes have consistent data

If the system chooses Availability (AP):

  • EU-West nodes continue accepting reads and writes
  • A product showing 3 units in stock in US-East may show 3 units in EU-West too
  • A customer in Europe purchases 2 units; simultaneously a customer in the US purchases 2 units
  • When connectivity resumes, reconciliation reveals 4 units were sold from a stock of 3
  • The system must handle the oversell: cancel one order, issue a refund, notify a customer

Neither outcome is pleasant. The question is which outcome is more acceptable for this specific business domain. For inventory, most e-commerce platforms choose AP with reconciliation logic — occasional oversells are cheaper than widespread checkout errors.


Consistency (C) in CAP means linearizability: every read returns the most recently written value. If a write completes, all subsequent reads from any node in the cluster must reflect that write. This is a strong guarantee — stronger than the "consistency" in ACID, which refers to data integrity constraints rather than read-after-write guarantees.

Availability (A) means every request to a non-failing node returns a response — not an error, not a timeout. The response may not be the most recent value, but it is a value. Availability in CAP is about liveness: the system keeps responding.

Partition Tolerance (P) means the system continues operating even when network messages between nodes are lost or delayed. A network partition is a split in the cluster where some nodes cannot communicate with others.

The most common misreading of CAP is treating it as a three-way choice. In practice, you cannot choose to drop partition tolerance. Network partitions happen. A cable gets cut. A switch fails. A cloud provider has a cross-AZ networking event. If your distributed system has more than one node, partitions will occur, and your system must have a defined behaviour when they do.

The real choice is binary: when a partition occurs, do you sacrifice consistency or availability?

This reframes the theorem: distributed systems are either CP (consistent but may become unavailable during partitions) or AP (available but may return stale data during partitions). There is no CA system in a world with network failures.

A CP system prioritises returning correct data over returning any data. When a partition is detected, nodes in the minority partition (the side that cannot confirm quorum) stop serving requests or return errors.

Canonical examples:

  • HBase — refuses writes that cannot be acknowledged by the required number of replicas
  • Zookeeper — stops serving clients if it loses quorum contact with the majority of the ensemble
  • Etcd — uses Raft consensus; a minority partition stops accepting writes until quorum is restored

When CP is the right choice:

  • Financial ledgers where two nodes must never hold different balances
  • Distributed locks where two nodes must never grant the same lock
  • Configuration stores where stale configuration could cause system-wide misconfiguration
  • Leader election systems where split-brain must be impossible

The production implication: CP systems trade availability SLOs for correctness guarantees. Your SLA cannot promise five nines if your CP database refuses writes during network events that occur multiple times per year.

An AP system prioritises returning a response over returning a guaranteed-correct response. During a partition, nodes in both halves continue operating independently, potentially diverging in state. When the partition heals, the system must reconcile divergent state through a conflict resolution strategy.

Canonical examples:

  • Cassandra — with quorum settings below majority, nodes in a minority partition continue accepting reads and writes
  • DynamoDB — eventually consistent reads may return stale data; strongly consistent reads sacrifice some availability
  • CouchDB — uses multi-version concurrency control with explicit conflict resolution
  • DNS — a foundational AP system; propagation is eventual, not immediate

When AP is the right choice:

  • User session stores where serving a slightly stale session is better than a login error
  • Product catalogue reads where showing a price that's 30 seconds old is acceptable
  • Social media feeds where eventual consistency is the expected model
  • Metrics and analytics where approximate values are acceptable

The production implication: AP systems push conflict resolution into application code. Someone must define what "last write wins," "highest value wins," or "merge conflicts" means for each data type in your domain.

Binary AP/CP labelling obscures an important reality: consistency is a spectrum, not a binary. Different operations within the same system can have different consistency requirements.

Consistency LevelDescriptionTypical Use Case
LinearizableReads always see the most recent writeBank balance, distributed locks
SequentialAll nodes see operations in the same orderReplicated state machines
CausalOperations that are causally related appear in orderComment replies appear after parent posts
Read-your-writesA client always reads its own writesUser profile updates visible immediately to the writer
Monotonic readOnce you read a value, you never read an older valuePaginated feeds
EventualAll nodes converge to the same value eventuallyDNS propagation, CDN caching

Most production systems operate somewhere on this spectrum rather than at the extremes. DynamoDB's "eventually consistent reads" provide eventual consistency; its "strongly consistent reads" provide linearizability at the cost of latency and cost.

Many distributed databases allow tuning consistency per operation using quorum settings. Cassandra's consistency levels are the canonical example:

N = number of replica nodes
W = number of replicas that must acknowledge a write
R = number of replicas that must respond to a read

Strong consistency: W + R > N
Eventual consistency: W + R ≤ N

With N=3 replicas:

WRGuarantee
31Writes are slow (all 3 must confirm); reads are fast
13Writes are fast; reads are slow (all 3 must respond)
22W + R = 4 > 3 — strong consistency with balanced latency
11Eventual consistency — fastest, least durable

This tunability means the AP/CP label applies per operation, not per system. A Cassandra cluster configured with QUORUM reads and writes behaves as CP. The same cluster with ONE reads and writes behaves as AP.


Strong consistency requires coordination between replicas. Coordination means network round-trips. Network round-trips add latency. In a geographically distributed system, a linearizable write to a US-East leader that must be acknowledged by a replica in EU-West adds at least the transatlantic round-trip latency (~70ms) to every write.

This is the latency tax of consistency. Many systems pay it only for the subset of operations that genuinely require it, and accept eventual consistency for the rest.

AP systems accept that reads may return stale data. The duration of staleness depends on replication lag — typically milliseconds in a healthy network, but unbounded during partitions. Systems designed for AP must explicitly model what stale reads mean for their use cases:

  • A product price that's 100ms stale: acceptable
  • A bank balance that's 100ms stale: not acceptable in most regulatory environments
  • A loyalty points balance that's 5 minutes stale: acceptable for display, not acceptable for redemption

AP systems require conflict resolution logic. Two nodes accepting conflicting writes must eventually converge. The convergence strategy adds operational complexity:

Last-write-wins (LWW) uses timestamps to resolve conflicts. The write with the latest timestamp survives. Simple, but vulnerable to clock skew and silently discards concurrent updates.

Multi-version concurrency control (MVCC) stores conflicting versions and surfaces the conflict to application code for resolution. Correct, but requires the application to implement merging logic.

CRDTs (Conflict-free Replicated Data Types) are data structures mathematically guaranteed to converge without conflicts under any merge order. They work for specific types (counters, sets, maps) but not for arbitrary application state.


Here is how a Laravel application can express explicit consistency requirements when using a multi-region DynamoDB setup:

class InventoryRepository
{
    // Strong consistency read — uses quorum, higher latency, correct for checkout
    public function getStockForCheckout(string $productId): int
    {
        $result = $this->dynamodb->getItem([
            'TableName'      => 'inventory',
            'Key'            => ['product_id' => ['S' => $productId]],
            'ConsistentRead' => true, // Linearizable read
        ]);

        return (int) $result['Item']['stock_count']['N'];
    }

    // Eventual consistency read — fast, acceptable for product page display
    public function getStockForDisplay(string $productId): int
    {
        $result = $this->dynamodb->getItem([
            'TableName'      => 'inventory',
            'Key'            => ['product_id' => ['S' => $productId]],
            'ConsistentRead' => false, // Eventually consistent read
        ]);

        return (int) $result['Item']['stock_count']['N'];
    }

    // Conditional write — prevents oversell even under eventual consistency
    public function decrementStock(string $productId, int $quantity): bool
    {
        try {
            $this->dynamodb->updateItem([
                'TableName'                => 'inventory',
                'Key'                      => ['product_id' => ['S' => $productId]],
                'UpdateExpression'         => 'SET stock_count = stock_count - :qty',
                'ConditionExpression'      => 'stock_count >= :qty',
                'ExpressionAttributeValues'=> [
                    ':qty' => ['N' => (string) $quantity],
                ],
            ]);
            return true;
        } catch (ConditionalCheckFailedException) {
            return false; // Insufficient stock
        }
    }
}

The decrementStock method uses a conditional write that enforces stock_count >= quantity atomically at the DynamoDB level. Even with eventual consistency on reads, the write operation itself is atomic — DynamoDB guarantees that a conditional write either succeeds completely or fails completely on a single item. This is a key pattern: use eventual consistency for reads where staleness is acceptable, but use conditional/atomic writes to enforce invariants at the persistence layer.


CAP properties are per-operation, not per-system. A system that uses MySQL with a single primary handles writes as CP and reads as either consistent (primary reads) or eventually consistent (replica reads). The architecture of the system doesn't fix the CAP properties — the configuration of each operation does.

In a healthy system with low replication lag, eventual consistency resolves in milliseconds. After a partition heals, it may take seconds or minutes. Systems that assume "eventually consistent" means "probably consistent right now" produce bugs that only surface during network events.

AP systems must reconcile divergent writes when a partition heals. If you haven't defined a conflict resolution strategy before building an AP system, you will be defining one reactively during a production incident while divergent data accumulates.

CAP applies to distributed systems. A single-node MySQL instance is not subject to CAP — it is a single point of truth with ACID transactions. Teams sometimes apply AP thinking to single-node systems unnecessarily, accepting eventual consistency where a transaction would have worked.


Geographically distributed systems face the most extreme consistency/latency tradeoffs. Synchronous replication across regions adds cross-region latency to every write — often 50–150ms. Most multi-region systems use asynchronous replication (accepting eventual consistency) for performance, with synchronous replication reserved for critical operations.

MySQL and PostgreSQL read replicas are AP: reads from replicas may be stale by the replication lag (typically milliseconds, but can grow under heavy write load). Systems that route reads to replicas must handle the case where a user writes data and immediately reads it from a lagging replica and sees the old value. The read-your-writes consistency model — routing a user's reads to the primary for a short window after they write — is a common mitigation.

CP systems use consensus algorithms (Raft, Paxos, Viewstamped Replication) to agree on values. Consensus requires a majority quorum, which means the number of network round-trips scales with cluster size. Large CP clusters (more than 5–7 nodes) typically use hierarchical quorum or leader-based replication to bound coordination cost.


The CAP theorem is a constraint, not a recipe. It tells you what you cannot have; it doesn't tell you what to build. The productive use of CAP is as a forcing function for explicit architectural decisions:

  • Identify every read path — what consistency level does this operation actually require?
  • Identify every write path — what happens if two nodes accept conflicting writes?
  • Define your partition behaviour — during a network event, do you serve errors or potentially stale data?
  • Document your conflict resolution strategy — before the partition heals, not after

The most common real-world answer is not "CP" or "AP" but "AP with compensation" — accept eventual consistency, enforce invariants through conditional writes and application-level conflict detection, and build reconciliation processes for the cases that slip through. That model handles most business domains correctly while delivering the availability that user-facing systems require.

The systems that break production SLAs are not the ones that chose the wrong CAP label. They're the ones that chose a label without understanding what it meant for their specific operations, and found out during an incident.

— end of articleBack to all articles

Work with me

Designing a complex system?

Available for engagements in distributed systems architecture, backend design, and engineering leadership.

Get in touch