Distributed caching systems (e.g., Memcached) are widely used by service providers to satisfy accesses by millions of concurrent clients. Given their large-scale, modern distributed systems rely on a middleware layer to manage caching nodes, to make applications easier to develop, and to apply load balancing and replication strategies. In this work, we performed a dependability evaluation of three popular middleware platforms, namely Twemproxy by Twitter, Mcrouter by Facebook, and Dynomite by Netflix, to assess availability and performance under faults, including failures of Memcached nodes and congestion due to unbalanced workloads and network link bandwidth bottlenecks. We point out the different availability and performance trade-offs achieved by the three platforms, and scenarios in which few faulty components cause cascading failures of the whole distributed system.
翻译:服务供应商广泛使用分布式缓存系统(如Memcached)满足数以百万计同时客户的存取。 由于其规模庞大,现代分布式系统依靠中器件层管理缓存节点,使应用程序更容易开发,并应用负载平衡和复制战略。 在这项工作中,我们对三个受欢迎的中间器件平台(即Twitter的Twemproxy、Facebook的Mcrouter和Netflix的Dynomite)进行了可靠性评估,以评估故障的可用性和性能,包括因工作量不平衡和网络连接带宽瓶颈造成的Memcached节点故障和拥堵。 我们指出三个平台的可用性和性与性之间的取舍不同,以及很少有错误部分导致整个分布式系统失灵的情景。