Learning Objectives
- Understand the CAP theorem and its implications for blockchain
- Learn about network partitions and how systems respond (CP vs AP)
- Understand eventual vs strong consistency and why blockchains need strong consistency
- Understand partial synchrony and GST (Global Stabilization Time)—why BFT works despite FLP (Fischer–Lynch–Paterson impossibility)
- Distinguish safety vs liveness and how they trade off during partitions
- Understand quorum intersection: why 3f+1 and 2f+1 guarantee agreement
- Learn leader-based consensus, view/round, and why timeouts are needed for view change
- Relate fault tolerance (crash vs Byzantine) to protocol design
1. The CAP Theorem
What is CAP?
In a distributed system you cannot have all three at once:
- C – Consistency: Every read sees the latest write (all nodes agree on the same data at the same time).
- A – Availability: Every request receives a response (no request is refused).
- P – Partition tolerance: The system keeps working even when the network splits (messages are dropped or delayed between groups of nodes).
In practice, partition tolerance (P) is non-negotiable—networks do partition. So the real choice is:
- CP: When a partition happens, sacrifice availability—refuse some requests until the partition heals so that consistency is preserved.
- AP: When a partition happens, sacrifice consistency—keep answering requests so both sides can diverge; you get eventual consistency.
Blockchains and CAP: Many blockchains are CP-oriented: they prefer consistency (one agreed chain) and may not make progress during severe partitions. Some designs relax this for liveness (availability of new blocks) under partial synchrony.
2. Network Partitions
What is a Partition?
A network partition is when the network splits into two or more groups that cannot communicate with each other. Messages between groups are lost or delayed indefinitely.
Why It Matters
During a partition, nodes in different groups cannot agree in real time. So:
- CP systems may stop accepting writes (or block progress) in one or both sides until the partition heals, to avoid inconsistent state.
- AP systems allow both sides to keep accepting operations; when the partition heals, they must merge or resolve conflicts (eventual consistency).
BFT consensus assumes that eventually the network is synchronous enough (e.g. after GST (Global Stabilization Time)) so that a quorum can communicate. Until then, progress might stall (liveness), but safety (no two quorums commit different values) is preserved.
3. Eventual Consistency vs Strong Consistency
Strong Consistency
Every read sees the result of the latest completed write. All nodes agree on the same order and value. This is what consensus and BFT aim for: one agreed sequence of blocks.
Eventual Consistency
If no new writes are made, eventually all nodes will converge to the same state. Before that, different nodes might see different data. Used in many AP systems (e.g. Dynamo-style, CRDTs).
In Blockchain
Blockchains typically want strong consistency on the canonical chain: once a block is committed (e.g. via BFT), all honest nodes agree. Nakamoto consensus gives probabilistic agreement (longest chain); BFT gives deterministic agreement.
4. Fault Tolerance
Failure Modes
- Crash: A node stops responding (no malicious behavior).
- Byzantine: A node can behave arbitrarily (delay, drop, or send conflicting messages). BFT protocols tolerate up to f Byzantine nodes out of 3f+1.
- Omission: Messages are dropped (can be modeled as crash or network partition).
Why It Matters for hyperscale-rs
Hyperscale-rs uses BFT consensus, so it is designed to tolerate Byzantine faults (up to 1/3 of validators). Understanding partitions and CAP helps you see why the protocol may not make progress during a partition (CP-style), and why quorum sizes (2f+1) are chosen so that any two quorums overlap in at least one honest node.
5. Partial Synchrony and GST (Global Stabilization Time)
The FLP (Fischer–Lynch–Paterson) impossibility result says: in a fully asynchronous network, deterministic consensus cannot guarantee termination if even one node can fail. What does that mean in plain terms?
- Fully asynchronous: There is no bound on how long messages take to arrive. You can never tell whether a node is slow, the network is delayed, or a message was lost. So you can't safely assume "after 5 seconds, if I got no reply, the node is dead."
- Cannot guarantee termination: The protocol might never "decide"—i.e. never output a final agreed value. The nodes might keep exchanging messages forever without reaching consensus. FLP proves that this is unavoidable in the asynchronous model if one node can crash.
- If even one node can fail: The result holds with just one crash fault (a node that stops responding). It doesn't require Byzantine (malicious) behavior.
So how do BFT blockchains work? They relax the model: they assume the network is not fully asynchronous forever. That's where partial synchrony comes in.
Partial synchrony
BFT protocols assume partial synchrony: the network is asynchronous until some unknown point, called GST (Global Stabilization Time). After GST, message delays are bounded by a known Δ. We don't know when GST is—only that it eventually happens.
- Before GST: The protocol may not make progress (liveness at risk), but it must not commit conflicting values (safety preserved).
- After GST: Timeouts and retries eventually allow a quorum to communicate, so the protocol can make progress.
Why this matters for Hyperscale: The BFT crate assumes eventual synchrony. During a partition or bad network, you may see no new blocks (liveness); once the network recovers (GST (Global Stabilization Time) has passed), the protocol resumes. This is the standard assumption for HotStuff-style consensus.
6. Safety vs Liveness
Consensus protocols aim for two guarantees:
- Safety: No two honest nodes ever commit different values (e.g. different blocks at the same height). Once committed, it's final.
- Liveness: The system eventually makes progress—new blocks keep being committed (assuming at most f faults and, for most BFT, partial synchrony).
Under a partition, you typically preserve safety (no conflicting commits) but may lose liveness (no progress until the partition heals). BFT designs prioritize safety: they would rather stall than allow two quorums to commit different blocks.
In Hyperscale: Steps 7–11 in the transaction flow (proposer → vote → QC → commit) can stall during a partition; when the network recovers, the same protocol continues and no honest node has committed a conflicting block.
7. Quorum Intersection: Why 3f+1 and 2f+1
In BFT with n = 3f + 1 nodes, a quorum is any set of 2f + 1 nodes. The crucial property:
Any two quorums intersect in at least one honest node.
- There are at most f Byzantine nodes. So there are at least 2f+1 honest nodes.
- Any set of 2f+1 nodes (a quorum) must contain at least (2f+1) − f = f+1 honest nodes.
- Two different quorums (each of size 2f+1) have at most 2f nodes total outside their intersection. So they share at least (2f+1) + (2f+1) − (3f+1) = 2f+1 − f = f+1 nodes. At least one of those must be honest (since at most f are Byzantine). So any two quorums share at least one honest node.
Consequence: if one quorum commits block A and another commits block B (A ≠ B), they would need to share an honest node that voted for both—but an honest node votes only for one value. So two different blocks cannot both get a quorum. That's the core of BFT safety.
8. Leader-Based Consensus: View, Round, and Timeouts
BFT protocols like HotStuff are leader-based: in each view (or round), one node—the proposer/leader—is designated to propose a block. Validators then vote; when 2f+1 vote, a QC is formed. For the exact steps (i)–(v) of how a QC is formed and how it is carried in the next block, plus crate and line references in hyperscale-rs, see Consensus Algorithms: From Basics to BFT, section “How a QC is formed”.
What if the leader is faulty or partitioned?
If the leader doesn't send a block, or validators don't receive it, the round would never finish. So protocols use timeouts: if no QC is formed within some time bound (after GST (Global Stabilization Time)), nodes assume the leader is faulty and move to the next view, where a new leader (chosen deterministically, e.g. round-robin) proposes. This is a view change.
- Why time? Liveness requires progress; without a timeout, a faulty leader could stall the system forever. Partial synchrony (bounded delay after GST (Global Stabilization Time)) makes timeouts meaningful.
- Implicit vs coordinated: In HotStuff-2, view change is implicit—on timeout, the next leader simply proposes; no separate view-change vote. PBFT uses an explicit coordinated view-change phase.
9. Why This Matters for Hyperscale-rs
When you read the Hyperscale-rs codebase or the Transaction Flow module:
- CAP & partitions: The system is CP-oriented; during a partition it may not produce new blocks (no availability), but it won't commit conflicting blocks (consistency).
- Partial synchrony: The BFT crate assumes the network eventually obeys bounds; timeouts and round advancement depend on that.
- Safety vs liveness: The commit rule (e.g. two-chain) and quorum intersection ensure safety; view changes and timeouts aim for liveness.
- Quorum size 2f+1: Votes and QCs use this threshold so that any two quorums overlap in an honest node—this is why the protocol is safe.
10. Key Takeaways
- CAP: under partitions, you trade off consistency (CP) vs availability (AP); blockchains typically choose CP.
- Partial synchrony (GST (Global Stabilization Time)) is the assumption that lets BFT bypass FLP (Fischer–Lynch–Paterson impossibility): after some unknown time, the network is "good enough" for progress.
- Safety = no conflicting commits; liveness = eventual progress. During partitions, safety is preserved, liveness may be lost.
- Quorum intersection (3f+1, 2f+1): any two quorums share an honest node, so they cannot commit different values.
- Leader-based BFT uses views/rounds and timeouts for view change when the leader is faulty or partitioned.
- Fault tolerance (crash vs Byzantine) determines protocol design (2f+1 vs 3f+1, and whether we need quorum intersection).
12. Practical Assignment
Assignment: Distributed Systems and BFT
- In one paragraph, explain why a distributed system cannot guarantee consistency, availability, and partition tolerance at the same time. Give one CP and one AP example.
- Explain in your own words: what is partial synchrony (GST (Global Stabilization Time)), and why does it allow BFT to guarantee both safety and liveness despite the FLP (Fischer–Lynch–Paterson) impossibility result?
- State the quorum intersection property (why any two quorums of 2f+1 in a 3f+1 system share an honest node). In one sentence, why does this guarantee that two different blocks cannot both be committed?
- During a network partition, does a BFT system typically preserve safety, liveness, or both? Briefly explain.
- In your learning journal, note how Hyperscale-rs (or the transaction flow) reflects: CAP choice, partial synchrony, safety vs liveness, and quorum size.
Success Criteria:
- ✅ You can explain CAP, partial synchrony, and safety vs liveness
- ✅ You understand quorum intersection and why 3f+1 / 2f+1
- ✅ You can relate these concepts to BFT and Hyperscale-rs