云原生可观测性中告警的有效性验证 (Validating Alerts in Cloud-Native Observability)

Observability and alerting form the backbone of modern reliability engineering. Alerts help teams catch faults early before they turn into production outages and serve as first clues for troubleshooting. However, designing effective alerts is challenging. They need to strike a fine balance between catching issues early and minimizing false alarms. On top of this, alerts often cover uncommon faults, so the code is rarely executed and therefore rarely checked. To address these challenges, several industry practitioners advocate for testing alerting code with the same rigor as application code. Still, there's a lack of tools that support such systematic design and validation of alerts. This paper introduces a new alerting extension for the observability experimentation tool OXN. It lets engineers experiment with alerts early during development. With OXN, engineers can now tune rules at design time and routinely validate the firing behavior of their alerts, avoiding future problems at runtime.

翻译：可观测性与告警构成了现代可靠性工程的基石。告警能帮助团队在故障演变为生产中断前及早发现，并为故障排查提供首要线索。然而，设计有效的告警具有挑战性：它们需要在及早发现问题与最小化误报之间取得精细平衡。此外，告警通常涵盖罕见故障，导致相关代码极少执行，因而也极少被检查。为应对这些挑战，多位行业实践者主张以与应用代码同等的严格程度测试告警代码。但目前仍缺乏支持此类系统性告警设计与验证的工具。本文为可观测性实验工具OXN引入了一种新型告警扩展功能。该功能使工程师能在开发早期对告警进行实验。通过OXN，工程师可在设计阶段调整规则，并常规化验证告警的触发行为，从而避免未来在运行时出现问题。

相关内容

Engineering

关注 6

《工程》是中国工程院（CAE）于2015年推出的国际开放存取期刊。其目的是提供一个高水平的平台，传播和分享工程研发的前沿进展、当前主要研究成果和关键成果；报告工程科学的进展，讨论工程发展的热点、兴趣领域、挑战和前景，在工程中考虑人与环境的福祉和伦理道德，鼓励具有深远经济和社会意义的工程突破和创新，使之达到国际先进水平，成为新的生产力，从而改变世界，造福人类，创造新的未来。期刊链接：https://www.sciencedirect.com/journal/engineering

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日