This paper focuses on effective user diagnostics generated during the deductive verification of probabilistic programs. Our key principle is based on providing slices for (1) error reporting, (2) proof simplification, and (3) preserving successful verification results. By formally defining these different notions on HeyVL, an existing quantitative intermediate verification language (IVL), our concepts (and implementation) can be used to obtain diagnostics for a range of probabilistic programming languages. Slicing for error reporting is a novel notion of error localization for quantitative assertions. We demonstrate slicing-based diagnostics on a variety of proof rules such as quantitative versions of the specification statement and invariant-based loop rules, and formally prove the correctness of specialized error messages and verification hints. We implemented our user diagnostics into the deductive verifier Caesar. Our novel implementation -- called \emph{Brutus} -- can search for slices which do or do not verify, corresponding to each of the three diagnostic notions. For error reporting (1), it exploits a binary search-based algorithm that minimizes error-witnessing slices. To solve for slices that verify (2 and 3), we empirically compare different algorithms based on unsatisfiable cores, minimal unsatisfiable subset enumeration, and a direct SMT encoding of the slicing problem. Our empirical evaluation of Brutus on existing and new benchmarks shows that we can find slices that are both small and informative.
翻译:本文聚焦于概率程序演绎验证过程中生成的有效用户诊断信息。我们的核心原理基于为以下三个方面提供切片:(1) 错误报告,(2) 证明简化,以及 (3) 保留成功验证结果。通过在现有定量中间验证语言 HeyVL 上形式化定义这些不同概念,我们的理论框架(及实现)可用于为多种概率编程语言生成诊断信息。面向错误报告的切片是定量断言错误定位的新颖概念。我们在多种证明规则(如规范语句的定量版本和基于不变量的循环规则)上展示了基于切片的诊断方法,并形式化证明了专用错误消息与验证提示的正确性。我们将用户诊断功能集成至演绎验证器 Caesar 中实现。我们命名为 \emph{Brutus} 的创新实现能够搜索验证通过或未通过的切片,分别对应三种诊断概念。对于错误报告(1),系统采用基于二分搜索的算法以最小化错误见证切片。针对验证通过的切片求解(2和3),我们通过实验比较了基于不可满足核、最小不可满足子集枚举以及切片问题直接 SMT 编码的不同算法。在现有及新基准测试中对 Brutus 的实证评估表明,我们能够找到既精简又信息丰富的切片。