使用代码覆盖代表制学习的错误地方化 (Fault Localization with Code Coverage Representation Learning)

In this paper, we propose DeepRL4FL, a deep learning fault localization (FL) approach that locates the buggy code at the statement and method levels by treating FL as an image pattern recognition problem. DeepRL4FL does so via novel code coverage representation learning (RL) and data dependencies RL for program statements. Those two types of RL on the dynamic information in a code coverage matrix are also combined with the code representation learning on the static information of the usual suspicious source code. This combination is inspired by crime scene investigation in which investigators analyze the crime scene (failed test cases and statements) and related persons (statements with dependencies), and at the same time, examine the usual suspects who have committed a similar crime in the past (similar buggy code in the training data). For the code coverage information, DeepRL4FL first orders the test cases and marks error-exhibiting code statements, expecting that a model can recognize the patterns discriminating between faulty and non-faulty statements/methods. For dependencies among statements, the suspiciousness of a statement is seen taking into account the data dependencies to other statements in execution and data flows, in addition to the statement by itself. Finally, the vector representations for code coverage matrix, data dependencies among statements, and source code are combined and used as the input of a classifier built from a Convolution Neural Network to detect buggy statements/methods. Our empirical evaluation shows that DeepRL4FL improves the top-1 results over the state-of-the-art statement-level FL baselines from 173.1% to 491.7%. It also improves the top-1 results over the existing method-level FL baselines from 15.0% to 206.3%.

翻译：在本文中,我们提议DeepRL4FL, 这是一种深学习错误定位法(FL), 这是一种深学习错误定位法(FL) 方法, 将FL视为图像模式识别问题。 DeepRL4FL通过对程序语句的新型代码覆盖代表学习(RL)和数据依赖关系RL来这样做。关于代码覆盖矩阵中的动态信息的两种 RL(RL) 与对常用可疑源代码的静态信息进行代码代表学习相结合。这种结合是由犯罪现场调查所启发的,调查人员在其中分析犯罪现场(未完成测试案件和声明)及相关人员(有依赖关系的声明),同时, 检查过去实施类似犯罪的通常嫌疑人(培训数据中存在相似的错误代码)。 DeepRL4LFL首先订购测试案例并标记限制错误代码声明, 期望模型能够识别错误和不失实报表/方法之间的模式。对于各种报表, 评估层次的可疑性说明(有依赖关系), 也看到对当前数据定义值的内值, 和内值的内值的内值显示数据流的内脏数据流。

相关内容

表示学习

关注 185

表示学习是通过利用训练数据来学习得到向量表示，这可以克服人工方法的局限性。表示学习通常可分为两大类，无监督和有监督表示学习。大多数无监督表示学习方法利用自动编码器（如去噪自动编码器和稀疏自动编码器等）中的隐变量作为表示。目前出现的变分自动编码器能够更好的容忍噪声和异常值。然而，推断给定数据的潜在结构几乎是不可能的。目前有一些近似推断的策略。此外，一些无监督表示学习方法旨在近似某种特定的相似性度量。提出了一种无监督的相似性保持表示学习框架，该框架使用矩阵分解来保持成对的DTW相似性。通过学习保持DTW的shaplets，即在转换后的空间中的欧式距离近似原始数据的真实DTW距离。有监督表示学习方法可以利用数据的标签信息，更好地捕获数据的语义结构。孪生网络和三元组网络是目前两种比较流行的模型，它们的目标是最大化类别之间的距离并最小化了类别内部的距离。

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【AAAI2020-Tutorial-Penn】迁移表示学习最新进展，Recent Advances in Transferable Representation Learning

专知会员服务

52+阅读 · 2020年2月8日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日