Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and increases the latency for clients that are not co-located with it. As a response to these drawbacks, Egalitarian Paxos introduced an alternative, leaderless approach, that allows replicas to order commands collaboratively. Not relying on a single leader allows the protocol to maintain non-zero throughput with up to $f$ crashes of any processes out of a total of $n = 2f+1$. The protocol furthermore allows any process to execute a command $c$ fast, in $2$ message delays, provided no more than $e = \lceil\frac{f+1}{2}\rceil$ other processes fail, and all concurrently submitted commands commute with $c$; the latter condition is often satisfied in practical systems. Egalitarian Paxos has served as a foundation for many other replication protocols. But unfortunately, the protocol is very complex, ambiguously specified and suffers from nontrivial bugs. In this paper, we present EPaxos* -- a simpler and correct variant of Egalitarian Paxos. Our key technical contribution is a simpler failure-recovery algorithm, which we have rigorously proved correct. Our protocol also generalizes Egalitarian Paxos to cover the whole spectrum of failure thresholds $f$ and $e$ such that $n \ge \max\{2e+f-1, 2f+1\}$ -- the number of processes that we show to be optimal.
翻译:经典的状态机复制协议(如Paxos)依赖一个指定的领导者进程来排序命令。然而,这种方法使领导者成为单点故障,并增加了未与其共置的客户端的延迟。针对这些缺陷,平等主义Paxos提出了一种无领导者的替代方案,允许副本协作排序命令。不依赖单一领导者使协议在总进程数$n = 2f+1$中最多$f$个进程崩溃时仍能维持非零吞吐量。此外,若不超过$e = \lceil\frac{f+1}{2}\rceil$个其他进程故障,且所有并发提交的命令与命令$c$可交换(后者在实际系统中常满足),则任何进程可快速在$2$个消息延迟内执行命令$c$。平等主义Paxos已成为许多其他复制协议的基础,但该协议极其复杂、规范模糊且存在非平凡缺陷。本文提出EPaxos*——平等主义Paxos的一个更简单且正确的变体。我们的核心技术贡献是一个更简洁的故障恢复算法,并已严格证明其正确性。该协议还将平等主义Paxos推广至覆盖整个故障阈值$f$和$e$的范围,满足$n \ge \max\{2e+f-1, 2f+1\}$——我们证明此进程数为最优值。