MutDafny：一种基于变异的方法用于评估Dafny规范 (MutDafny: A Mutation-Based Approach to Assess Dafny Specifications)

This paper explores the use of mutation testing to reveal weaknesses in formal specifications written in Dafny. In verification-aware programming languages, such as Dafny, despite their critical role, specifications are as prone to errors as implementations. Flaws in specs can result in formally verified programs that deviate from the intended behavior. We present MutDafny, a tool that increases the reliability of Dafny specifications by automatically signaling potential weaknesses. Using a mutation testing approach, we introduce faults (mutations) into the code and rely on formal specifications for detecting them. If a program with a mutant verifies, this may indicate a weakness in the specification. We extensively analyze mutation operators from popular tools, identifying the ones applicable to Dafny. In addition, we synthesize new operators tailored for Dafny from bugfix commits in publicly available Dafny projects on GitHub. Drawing from both, we equipped our tool with a total of 32 mutation operators. We evaluate MutDafny's effectiveness and efficiency in a dataset of 794 real-world Dafny programs and we manually analyze a subset of the resulting undetected mutants, identifying five weak real-world specifications (on average, one at every 241 lines of code) that would benefit from strengthening.

翻译：本文探讨了利用变异测试来揭示用Dafny编写的形式化规范中的弱点。在诸如Dafny这类支持验证的编程语言中，尽管规范起着关键作用，但它们与实现一样容易出错。规范中的缺陷可能导致形式化验证的程序偏离预期行为。我们提出了MutDafny，这是一个通过自动指示潜在弱点来提高Dafny规范可靠性的工具。采用变异测试方法，我们在代码中引入故障（变异），并依赖形式化规范来检测它们。如果包含变异体的程序通过了验证，这可能表明规范存在弱点。我们深入分析了流行工具中的变异算子，识别出适用于Dafny的算子。此外，我们从GitHub上公开可用的Dafny项目的错误修复提交中，合成了专门针对Dafny的新算子。基于这两方面，我们为工具配备了总计32个变异算子。我们在包含794个真实世界Dafny程序的数据集上评估了MutDafny的有效性和效率，并手动分析了由此产生的未检测变异体的一个子集，识别出五个脆弱的真实世界规范（平均每241行代码存在一个），这些规范将受益于加强。