Distributed Systems Fundamentals

⏱️ Duration: 2-2.5 hours 📊 Difficulty: Basic

Learning Objectives

1. The CAP Theorem

What is CAP?

In a distributed system you cannot have all three at once:

In practice, partition tolerance (P) is non-negotiable—networks do partition. So the real choice is:

Blockchains and CAP: Many blockchains are CP-oriented: they prefer consistency (one agreed chain) and may not make progress during severe partitions. Some designs relax this for liveness (availability of new blocks) under partial synchrony.

2. Network Partitions

What is a Partition?

A network partition is when the network splits into two or more groups that cannot communicate with each other. Messages between groups are lost or delayed indefinitely.

Why It Matters

During a partition, nodes in different groups cannot agree in real time. So:

BFT consensus assumes that eventually the network is synchronous enough (e.g. after GST (Global Stabilization Time)) so that a quorum can communicate. Until then, progress might stall (liveness), but safety (no two quorums commit different values) is preserved.

3. Eventual Consistency vs Strong Consistency

Strong Consistency

Every read sees the result of the latest completed write. All nodes agree on the same order and value. This is what consensus and BFT aim for: one agreed sequence of blocks.

Eventual Consistency

If no new writes are made, eventually all nodes will converge to the same state. Before that, different nodes might see different data. Used in many AP systems (e.g. Dynamo-style, CRDTs).

In Blockchain

Blockchains typically want strong consistency on the canonical chain: once a block is committed (e.g. via BFT), all honest nodes agree. Nakamoto consensus gives probabilistic agreement (longest chain); BFT gives deterministic agreement.

4. Fault Tolerance

Failure Modes

Why It Matters for hyperscale-rs

Hyperscale-rs uses BFT consensus, so it is designed to tolerate Byzantine faults (up to 1/3 of validators). Understanding partitions and CAP helps you see why the protocol may not make progress during a partition (CP-style), and why quorum sizes (2f+1) are chosen so that any two quorums overlap in at least one honest node.

5. Partial Synchrony and GST (Global Stabilization Time)

The FLP (Fischer–Lynch–Paterson) impossibility result says: in a fully asynchronous network, deterministic consensus cannot guarantee termination if even one node can fail. What does that mean in plain terms?

So how do BFT blockchains work? They relax the model: they assume the network is not fully asynchronous forever. That's where partial synchrony comes in.

Partial synchrony

BFT protocols assume partial synchrony: the network is asynchronous until some unknown point, called GST (Global Stabilization Time). After GST, message delays are bounded by a known Δ. We don't know when GST is—only that it eventually happens.

Why this matters for Hyperscale: The BFT crate assumes eventual synchrony. During a partition or bad network, you may see no new blocks (liveness); once the network recovers (GST (Global Stabilization Time) has passed), the protocol resumes. This is the standard assumption for HotStuff-style consensus.

6. Safety vs Liveness

Consensus protocols aim for two guarantees:

Under a partition, you typically preserve safety (no conflicting commits) but may lose liveness (no progress until the partition heals). BFT designs prioritize safety: they would rather stall than allow two quorums to commit different blocks.

In Hyperscale: Steps 7–11 in the transaction flow (proposer → vote → QC → commit) can stall during a partition; when the network recovers, the same protocol continues and no honest node has committed a conflicting block.

7. Quorum Intersection: Why 3f+1 and 2f+1

In BFT with n = 3f + 1 nodes, a quorum is any set of 2f + 1 nodes. The crucial property:

Any two quorums intersect in at least one honest node.

Consequence: if one quorum commits block A and another commits block B (A ≠ B), they would need to share an honest node that voted for both—but an honest node votes only for one value. So two different blocks cannot both get a quorum. That's the core of BFT safety.

8. Leader-Based Consensus: View, Round, and Timeouts

BFT protocols like HotStuff are leader-based: in each view (or round), one node—the proposer/leader—is designated to propose a block. Validators then vote; when 2f+1 vote, a QC is formed. For the exact steps (i)–(v) of how a QC is formed and how it is carried in the next block, plus crate and line references in hyperscale-rs, see Consensus Algorithms: From Basics to BFT, section “How a QC is formed”.

What if the leader is faulty or partitioned?

If the leader doesn't send a block, or validators don't receive it, the round would never finish. So protocols use timeouts: if no QC is formed within some time bound (after GST (Global Stabilization Time)), nodes assume the leader is faulty and move to the next view, where a new leader (chosen deterministically, e.g. round-robin) proposes. This is a view change.

9. Why This Matters for Hyperscale-rs

When you read the Hyperscale-rs codebase or the Transaction Flow module:

10. Key Takeaways

11. Quiz

12. Practical Assignment

Assignment: Distributed Systems and BFT

  1. In one paragraph, explain why a distributed system cannot guarantee consistency, availability, and partition tolerance at the same time. Give one CP and one AP example.
  2. Explain in your own words: what is partial synchrony (GST (Global Stabilization Time)), and why does it allow BFT to guarantee both safety and liveness despite the FLP (Fischer–Lynch–Paterson) impossibility result?
  3. State the quorum intersection property (why any two quorums of 2f+1 in a 3f+1 system share an honest node). In one sentence, why does this guarantee that two different blocks cannot both be committed?
  4. During a network partition, does a BFT system typically preserve safety, liveness, or both? Briefly explain.
  5. In your learning journal, note how Hyperscale-rs (or the transaction flow) reflects: CAP choice, partial synchrony, safety vs liveness, and quorum size.

Success Criteria:

  • ✅ You can explain CAP, partial synchrony, and safety vs liveness
  • ✅ You understand quorum intersection and why 3f+1 / 2f+1
  • ✅ You can relate these concepts to BFT and Hyperscale-rs