High-quality estimates of uncertainty and robustness are crucial for numerous real-world applications, especially for deep learning which underlies many deployed ML systems. The ability to compare techniques for improving these estimates is therefore very important for research and practice alike. Yet, competitive comparisons of methods are often lacking due to a range of reasons, including: compute availability for extensive tuning, incorporation of sufficiently many baselines, and concrete documentation for reproducibility. In this paper we introduce Uncertainty Baselines: high-quality implementations of standard and state-of-the-art deep learning methods on a variety of tasks. As of this writing, the collection spans 19 methods across 9 tasks, each with at least 5 metrics. Each baseline is a self-contained experiment pipeline with easily reusable and extendable components. Our goal is to provide immediate starting points for experimentation with new methods or applications. Additionally we provide model checkpoints, experiment outputs as Python notebooks, and leaderboards for comparing results. Code available at https://github.com/google/uncertainty-baselines.
翻译:对不确定性和稳健性的高质量估计对于许多现实应用至关重要,特别是对于许多部署的ML系统的深层学习而言。因此,对改进这些估计的技术进行比较的能力对于研究和实践都非常重要。然而,由于一系列原因,往往缺乏对方法的竞争性比较,原因包括:计算广泛调试的可用性,纳入足够多的基线,以及复制的具体文件。在本文件中,我们引入了不确定性基准:在各种任务方面高质量地实施标准和最先进的深层学习方法。至于此,收集工作跨越9项任务,每个任务有19种方法,每个任务有至少5度。每个基线都是一个自成一体的试验管道,有易于再使用和可扩展的组成部分。我们的目标是为试验新方法或应用提供即时的起点。此外,我们提供示范检查站,作为Python笔记本的实验产出,以及用于比较结果的首板。代码见https://github.com/google/uncertaty-baselines。