Statistical fault localization (SFL) techniques use execution profiles and success/failure information from software executions, in conjunction with statistical inference, to automatically score program elements based on how likely they are to be faulty. SFL techniques typically employ one type of profile data: either coverage data, predicate outcomes, or variable values. Most SFL techniques actually measure correlation, not causation, between profile values and success/failure, and so they are subject to confounding bias that distorts the scores they produce. This paper presents a new SFL technique, named \emph{UniVal}, that uses causal inference techniques and machine learning to integrate information about both predicate outcomes and variable values to more accurately estimate the true failure-causing effect of program statements. \emph{UniVal} was empirically compared to several coverage-based, predicate-based, and value-based SFL techniques on 800 program versions with real faults.
翻译:统计本地化(SFL)技术使用执行概况和软件执行的成功/失败信息,加上统计推论,根据程序要素的可能错误程度自动评分。SFL技术通常使用一种剖析数据:覆盖数据、上游结果或变量值。大多数SFL技术实际上测量剖析值与成功/失败之间的关联性,而不是因果关系,因此它们会受到扭曲其得分的混淆偏见的影响。本文介绍了一种新的SFL技术,名为\emph{UniVal},它使用因果关系推论技术和机器学习,将关于上游结果和变量值的信息结合起来,以更准确地估计程序说明的真正失败效应。\emph{Unival}在经验上与800个程序版本中存在实际错误的基于覆盖、上游和基于价值的SFL技术进行了比较。