CAP Theorem (also known as Brewer’s Theorem) states that in a distributed system, it is impossible to simultaneously guarantee Consistency, Availability, and Partition Tolerance. You can only achieve two of these three properties at the same time.
Here is a breakdown of the three components:
-
Consistency (C):
Every read receives the most recent write. All nodes in the system return the same, up-to-date data. -
Availability (A):
Every request (read or write) receives a response, even if some nodes are down. -
Partition Tolerance (P):
The system continues to operate even if there is a network partition (communication failure between nodes).
Since network partitions (P) are unavoidable in distributed systems, the CAP theorem effectively forces a trade-off between Consistency (C) and Availability (A). Based on this, distributed databases are classified into two main types:
- CP (Consistency + Partition Tolerance): Prioritizes consistency over availability.
- AP (Availability + Partition Tolerance): Prioritizes availability over consistency.
Databases Supporting CAP Combinations
CAP Property | Database Type | Examples | Explanation |
---|---|---|---|
CP (Consistency + Partition Tolerance) | Strongly Consistent Databases | – MongoDB (in strong consistency mode) – HBase – Redis (Single-node) – Zookeeper | These systems prioritize consistency over availability when a partition occurs. They ensure all nodes agree on the latest data, but might sacrifice availability. |
AP (Availability + Partition Tolerance) | Highly Available Databases | – DynamoDB – Cassandra – CouchDB – Riak | These systems prioritize availability over consistency during network partitions. They ensure the system remains available, but the data may not be up-to-date (eventual consistency). |
CA (Consistency + Availability) | Not possible in partition-tolerant systems | – Traditional Relational Databases (single node): PostgreSQL, MySQL | CA systems are achievable only if there are no network partitions. This is generally possible on a single-node database, where partitions don’t occur. |
Why is CA Not Possible in Distributed Systems?
In a distributed environment, network partitions (P) are given because systems can fail or experience delays. Therefore, the CA combination cannot exist in a distributed database because maintaining consistency and availability simultaneously during a partition is impossible.
Summary of CAP Trade-offs
- CP: Strong consistency, but the system may become unavailable during partitions.
- Example: HBase, Zookeeper, etc.
- AP: High availability but eventual consistency.
- Example: DynamoDB, Cassandra, CouchDB.
- CA: Only achievable in non-distributed systems (no network partitions).
- Example: Single-node PostgreSQL or MySQL.