BI - Data Warehouse/ Lake/ Lakehouse

CAP Theorem

December 17, 2024

132

CAP Theorem (also known as Brewer’s Theorem) states that in a distributed system, it is impossible to simultaneously guarantee Consistency, Availability, and Partition Tolerance. You can only achieve two of these three properties at the same time.

Here is a breakdown of the three components:

Consistency (C):
Every read receives the most recent write. All nodes in the system return the same, up-to-date data.
Availability (A):
Every request (read or write) receives a response, even if some nodes are down.
Partition Tolerance (P):
The system continues to operate even if there is a network partition (communication failure between nodes).

Since network partitions (P) are unavoidable in distributed systems, the CAP theorem effectively forces a trade-off between Consistency (C) and Availability (A). Based on this, distributed databases are classified into two main types:

CP (Consistency + Partition Tolerance): Prioritizes consistency over availability.
AP (Availability + Partition Tolerance): Prioritizes availability over consistency.

Databases Supporting CAP Combinations

CAP Property	Database Type	Examples	Explanation
CP (Consistency + Partition Tolerance)	Strongly Consistent Databases	– MongoDB (in strong consistency mode) – HBase – Redis (Single-node) – Zookeeper	These systems prioritize consistency over availability when a partition occurs. They ensure all nodes agree on the latest data, but might sacrifice availability.
AP (Availability + Partition Tolerance)	Highly Available Databases	– DynamoDB – Cassandra – CouchDB – Riak	These systems prioritize availability over consistency during network partitions. They ensure the system remains available, but the data may not be up-to-date (eventual consistency).
CA (Consistency + Availability)	Not possible in partition-tolerant systems	– Traditional Relational Databases (single node): PostgreSQL, MySQL	CA systems are achievable only if there are no network partitions. This is generally possible on a single-node database, where partitions don’t occur.

Why is CA Not Possible in Distributed Systems?

In a distributed environment, network partitions (P) are given because systems can fail or experience delays. Therefore, the CA combination cannot exist in a distributed database because maintaining consistency and availability simultaneously during a partition is impossible.

Summary of CAP Trade-offs

CP: Strong consistency, but the system may become unavailable during partitions.
- Example: HBase, Zookeeper, etc.
AP: High availability but eventual consistency.
- Example: DynamoDB, Cassandra, CouchDB.
CA: Only achievable in non-distributed systems (no network partitions).
- Example: Single-node PostgreSQL or MySQL.

CAP Theorem

Databases Supporting CAP Combinations

Why is CA Not Possible in Distributed Systems?

Summary of CAP Trade-offs

EDITOR PICKS

Estimation for Agile Developers While Status Reporting to Waterfall Managers

5 Major Reasons Why So Many Companies Fail At Social Media

Best Practices for Distributed Or Remote Teams in the Age of...

POPULAR POSTS

How to use business objects @Prompt Variable to build flexible universes...

How to Merge Data from Multiple Data Providers in WEBIntelligence (webi)

How to Calculate Number Of Days in a Month or Month...

POPULAR CATEGORY

Practical Performance: Redis vs. Materialized Views

SPARK Interview Preparation Questions