In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.
翻译:在本文中,我们提议DeepRL4FL, 这是一种深学习错误定位法(FL), 这是一种深学习错误定位法(FL) 方法, 将FL视为图像模式识别问题。 DeepRL4FL通过对程序语句的新型代码覆盖代表学习(RL)和数据依赖关系RL来这样做。关于代码覆盖矩阵中的动态信息的两种 RL(RL) 与对常用可疑源代码的静态信息进行代码代表学习相结合。这种结合是由犯罪现场调查所启发的,调查人员在其中分析犯罪现场(未完成测试案件和声明)及相关人员(有依赖关系的声明),同时, 检查过去实施类似犯罪的通常嫌疑人(培训数据中存在相似的错误代码)。 DeepRL4LFL首先订购测试案例并标记限制错误代码声明, 期望模型能够识别错误和不失实报表/方法之间的模式。 对于各种报表, 评估层次的可疑性说明(有依赖关系), 也看到对当前数据定义值的内值, 和内值的内值的内值显示数据流的内脏数据流。