ACCORD : The newest consensus protocol in town

By: Sid

November 21, 2025

At this point, y'all know how i love to yap about the latest tech in town. During one of our paper reading sessions in office I wanted to present something on the lines of distributed systems and consensus protocols and that's how I stumbled on ACCORD as a part of CEP-15.

Every time I look at traditional consensus algorithms, I see the same painful pattern: they're either fast but have a single point of failure, or they're resilient but slow.

Today, I'm talking about Accord — a protocol that actually solves this, and honestly? It's kind of a masterpiece in systems design. took away soooooo much from this paper.

lets begin !

The Problem We've Been Living With

Paxos (1998):

Consensus, but needs 2 round-trips. Why? Because concurrent proposals from different replicas conflict, and you need another round to resolve it T~T

[fun fact : the original paxos paper was about parliaments. it actually solved the consensus protocol but not once in the paper was computers and systems mentioned. the algorithm was derived from this and put into a readable format which was more comprehendable in 2001. the original paper is still considered one of the most consfusing reads xD ]

Multi-Paxos & Raft (2001, 2014):

they were fans of electing a leader to reach consensus.
the smart move reduces to the cycle to 1 round-trip.

But now:

Leader failure = service degradation until new leader elected
Geographically distant clients must talk to a distant leader (latency nightmare lmao)
Single leader = throughput bottleneck

EPaxos (Egalitarian Paxos):

"What if we make it leaderless again?" Great idea! But under maximum tolerated failures, it forces you into the slow path anyway.(how? ill show you)

Also, it needs identical operation histories across replicas—complex topological sorting required.

Sound familiar? Systems trying to solve one problem by creating another xD

[i suggest y'all read about the other algorithms properly before going ahead, it'll make way more sense ]

Enter Accord: The Protocol That Closes the Gap

Here's the thing about Accord, it's not just incremental. It's the first leaderless consensus protocol that's production-stable. And it does this by co-designing replication AND transactions as a single unified protocol.

Traditional systems do this:

Client → Transaction Layer (2PC) → Replication Layer (Paxos/Raft) → 4 round-trips
         [coordination once]          [coordination again]

Accord does this:

Client → Accord (handles BOTH transactions + replication) → 1-2 round-trips
         [simultaneous coordination]

No redundant coordination.

Innovation 1: Flexible Fast-Path Electorates

Here's where it gets clever.

EPaxos uses fixed quorum rules.

Say you have 5 replicas and can tolerate 2 failures:

Fast-path quorum: 4 out of 5 nodes
((|F| = ⌈(|E| + f + 1) / 2⌉)
F = (5+2+1)/2 = 8/2 = 4)
Problem: If 2 nodes actually fail, you only have 3 left. Can't make 4 votes. Fast-path impossible. Forced to slow-path. (remember how i told you it forces you into slow path? this is it)

2 RTT instead of 1.

Accord's innovation? Dynamic electorates.

Instead of fixed quorum rules, Accord can exclude slow or distant nodes from the voting set without reducing fault tolerance. Same 5 replicas, 2 failures tolerated, but:

Electorate: 3 fast nodes (exclude the 2)
Fast-path quorum: 3 out of 3
Result: Fast-path always succeeds, even under maximum failures

This is the first protocol to maintain optimal performance (fast-path) under ANY tolerated failures.

EPaxos? Only survives f/2 failures in fast-path while Accord survives f failures.

Innovation 2: Timestamp Reorder Buffer

Alright, imagine this scenario:

Transaction A starts in US-East at T0.
Transaction B starts in EU-West at T0 (simultaneously).

Without reordering:

R1 (US-East) receives A at T5, processes it immediately, sees A first
R3 (EU-West) receives B at T8, processes it immediately, sees B first
R1 receives B at T10, but already processed A (sees order: A, B)
R3 receives A at T12, but already processed B (sees order: B, A)

Different replicas see different orders! Fast-path fails.
Back to slow-path.
Painful T~T

Accord's fix? Intentionally buffer and wait.

Instead of processing immediately, replicas BUFFER incoming transactions and wait until T20 (enough time for all messages to likely arrive). Then they process all buffered messages sorted by timestamp:

T5  R1 receives A → BUFFER
T8  R3 receives B → BUFFER
T10 R1 receives B → BUFFER both
T12 R3 receives A → BUFFER both
T20 ALL replicas sort by timestamp → UNANIMOUS ORDER

Both see the same order. Fast-path succeeds.

there's no way you didn't go "wtf, isn't waiting gonna make it slower??"

so, here's the kicker math:

Cost of waiting: 10-20 milliseconds (typical network latency + clock skew)

Benefit of avoiding slow-path round-trip: 50-100 milliseconds (cross-region RTT)

Net result: 3-5x faster overall latency by being patient for milliseconds!

And no, you don't need atomic clocks or GPS (looking at you, Spanner 👀). Just standard NTP clock sync. Accord's philosophy? Use cheap, standard time sync and just wait a bit.
Still WAY cheaper than an extra network round-trip.

How Accord Actually Works: Three Phases

Let me walk through a real scenario:
transferring $100 from Alice's account (Shard A) to Bob's account (Shard B).

Both shards are replicated across 5 replicas.

Phase 1: PreAccept (Propose & Vote)

Step 1: Coordinator proposes

Picks timestamp t = 120000.000, seq=1, id=Coordinator
Sends to fast-path electorate (4 nodes): R1, R2, R3, R4

Step 2: Each replica checks for conflicts

R1: "No conflicts on Alice/Bob accounts" → Accept t, no dependencies
R2: "No conflicts" → Accept t, no dependencies
R3: "I saw another transaction on Alice's account at t=120000.005 (higher!)" → Reject, propose t=120000.005, include Transaction 2 as dependency
R4: "No conflicts" → Accept t, no dependencies

Step 3: Coordinator counts votes

3 out of 4 accept t [screw you R3 (•̀⤙•́ )]
1 rejects (proposes higher timestamp)

Decision point: Not unanimous? Can't use fast-path. Go to Accept phase (slow-path).

Phase 2: Accept (Resolve Conflicts)

Step 1: Coordinator picks final timestamp

Looks at all responses: max timestamp is 120000.005
Merge all dependencies: must include Transaction 2
tfinal = 120000.005

Step 2: Sends Accept to simple majority (3 out of 5)

R1, R2, R3: "Accept tfinal = 120000.005, dependencies include TX2"

Step 3: All 3 accept → simple majority reached

Timestamp is now durably decided
TX2 must execute first

note : This phase only runs if PreAccept didn't get unanimous agreement

Phase 3: Commit (Broadcast Decision)

Always runs, whether fast-path or slow-path.

Coordinator broadcasts to ALL replicas: "Transaction committed at 120000.005, dependencies resolved."

All replicas record the decision. Execute only after dependencies complete. Update Alice -100, Bob +100.

flow

Recovery Protocol: What if the Coordinator Crashes?

This is where Accord's design shines.

When a coordinator goes down mid-transaction:

A new replica volunteers as Recovery Coordinator and contacts a recovery quorum (f+1 replicas). By asking "what do you know about this transaction?", it determines:

Case A: PreAccept phase succeeded → Commit immediately
Case B: PreAccept had conflicts → Run Accept phase, then Commit
Case C: Accept phase done → Send Commit
Case D: Already committed → Broadcast to remaining replicas
Case E: Never started → Abort or retry

The recovery coordinator resumes from wherever the protocol left off. No guessing, no rollbacks, no lost transactions.

Real-World Impact: Cassandra Gets ACID Transactions

Here's why this matters for Cassandra users:

Before Accord:

No cross-partition transactions
Eventually consistent by default
Lightweight transactions limited to single partition
Applications must handle coordination (nightmare)

After Accord (CEP-15, Cassandra 5.0):

Full ACID transactions across partitions
Strict serializability (strongest guarantee)
Geographic distribution without sacrificing consistency
Database handles coordination automatically

Cassandra's famous for leaderless, linear scalability. Accord lets it keep that and offer ACID transactions—without Spanner's expensive hardware, without Raft/Paxos bottlenecks.

The Technical Achievements

1.Leaderless can be production-stable — Previous leaderless attempts (EPaxos, TAPIR, Janus) had critical instability issues. Accord is first to ship.

2.Matches leader-based performance without the bottleneck — Optimal latency, no SPOF, better geographic distribution.

3.Predictable performance under ANY conditions — Failures? Fast-path survives. Contention? Guaranteed 1 RTT. Transactions always complete.

4.Novel mechanisms:

Flexible electorates: Dynamic quorum adjustment
Timestamp reorder buffer: Conflict avoidance through intentional delay
Co-designed protocol: Unified replication + transactions

It's elegant. It's practical. It uses standard infrastructure (NTP, not atomic clocks). It's shipping in real databases. And it proves that "leaderless" and "production-ready" aren't contradictions.

That's worth talking about ٩(^ᗜ^ )و

TL;DR: Accord is the first production-stable leaderless consensus protocol. It achieves this by:

Co-designing transactions + replication as one protocol (no redundant coordination)
Using dynamic electorates to maintain fast-path even under failures
Buffering transactions by timestamp to ensure consistent ordering across replicas
Recovering elegantly when coordinators fail

Result: ACID transactions across geographic regions at leaderless scale.

References :

Read the whitepaper here

blogs@sid.