大规模实际变异测试 (Practical Mutation Testing at Scale)

Mutation analysis assesses a test suite's adequacy by measuring its ability to detect small artificial faults, systematically seeded into the tested program. Mutation analysis is considered one of the strongest test-adequacy criteria. Mutation testing builds on top of mutation analysis and is a testing technique that uses mutants as test goals to create or improve a test suite. Mutation testing has long been considered intractable because the sheer number of mutants that can be created represents an insurmountable problem -- both in terms of human and computational effort. This has hindered the adoption of mutation testing as an industry standard. For example, Google has a codebase of two billion lines of code and more than 500,000,000 tests are executed on a daily basis. The traditional approach to mutation testing does not scale to such an environment. To address these challenges, this paper presents a scalable approach to mutation testing based on the following main ideas: (1) Mutation testing is done incrementally, mutating only changed code during code review, rather than the entire code base; (2) Mutants are filtered, removing mutants that are likely to be irrelevant to developers, and limiting the number of mutants per line and per code review process; (3) Mutants are selected based on the historical performance of mutation operators, further eliminating irrelevant mutants and improving mutant quality. Evaluation in a code-review-based setting with more than 24,000 developers on more than 1,000 projects shows that the proposed approach produces orders of magnitude fewer mutants and that context-based mutant filtering and selection improve mutant quality and actionability. Overall, the proposed approach represents a mutation testing framework that seamlessly integrates into the software development workflow and is applicable up to large-scale industrial settings.

翻译：突变分析通过测量检测小人工故障的能力,对测试套件的适足性进行评估。突变分析被视为最强的测试适足性标准之一。突变测试建立在突变分析之上, 是一种测试技术, 以变异体为测试目标, 以创建或改进测试套件。变异体数量之多, 长期被认为是难以解决的, 因为在人类和计算努力方面, 可以创造的变异体是一个不可克服的问题。这阻碍了采用突变测试作为行业标准。例如, 谷歌的代码库有20亿条代码, 每天进行50多万次以上的测试。传统的突变测试方法在突变分析分析的基础上, 以突变技术为基础, 建立或改进测试套件套件。 (1) 变异体测试是递增式的, 仅仅在基于代码的审查过程中, 而不是整个代码基础。 (2) 变异变体被过滤, 清除变体的变异体可能与开发者无关, 并且每天进行50万个以上的测试。传统的突变体测试, 变变体的变变变变体的变体的变型操作比正常操作比正常的操作更精确性评估过程要多。