Microservice management and testbed research often rests on assumptions about deployments that have rarely been validated at production scale. While recent studies have begun to characterise production microservice deployments, they are often limited in breadth, do not compare findings across deployments, and lack consideration of the implications of findings for commonly held assumptions. We analyse a distributed tracing dataset from Alibaba's production microservice deployment to examine its scale, heterogeneity, and dynamicity. By comparing our findings to prior measurements of Meta's MSA we illustrate both convergent and divergent properties, clarifying which patterns may generalise. Our study reveals extreme architectural scale, long-tail distributions of workloads and dependencies, highly diverse functionality, substantial call graph variability, and pronounced time-varying behaviour which diverge from assumptions underlying research models and testbeds. We summarise how these observations challenge common assumptions in research on fault management, scaling, and testbed design, and outline recommendations for more realistic future approaches and evaluations.
翻译:微服务管理与测试平台研究通常基于对部署的假设,而这些假设很少在生产规模下得到验证。尽管近期研究开始描述生产环境微服务部署的特征,但往往广度有限,未能在不同部署间比较发现结果,且缺乏对研究发现与常见假设之间关联的考量。本文通过分析阿里巴巴生产环境微服务部署的分布式追踪数据集,考察其规模、异构性与动态性。通过将我们的发现与先前对Meta微服务架构(MSA)的测量结果进行对比,我们阐明了收敛与分异的特性,从而明确哪些模式可能具有普适性。本研究发现:极端的架构规模、工作负载与依赖关系的长尾分布、高度多样化的功能、显著的调用图可变性,以及与研究模型和测试平台所基于假设相背离的明显时变行为。我们总结了这些观察结果如何挑战故障管理、弹性伸缩和测试平台设计研究中的常见假设,并概述了未来更贴近实际的方法与评估建议。