基于图形的机器学习改善即时缺陷预测 (Graph-Based Machine Learning Improves Just-in-Time Defect Prediction) - 专知论文

会员服务 ·

0

缺陷预测 · 基于图形 · 机器学习 · 软件 · 提取 ·

2023 年 4 月 14 日

Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

翻译：基于图形的机器学习改善即时缺陷预测

Jonathan Bryan,Pablo Moriano

from arxiv, 22 pages, 2 figures, 4 tables; references added; expanded results to match baseline conditions

The increasing complexity of today's software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 77.55% and a Matthews correlation coefficient (MCC) as high as 53.16%. This represents a 152% higher F1 score and a 3% higher MCC over the state-of-the-art JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction.

翻译：随着软件的日益复杂化，需要成千上万的开发人员作出贡献。这种复杂的协作结构使开发人员更容易引入缺陷导致软件故障。确定何时引入这些缺陷成为了具有挑战性的问题，使用传统机器学习方法进行判断似乎已经达到了瓶颈。在这项工作中，我们构建了包含开发人员和源文件的贡献图以捕捉建立软件所需的复杂变化的细微复杂程度。通过利用这些贡献图，我们的研究显示了利用基于图形的机器学习来改善即时缺陷预测的潜力。我们假设从贡献图中提取的特征可能比从软件特征中提取的内在特征更能预测有缺陷的变更。我们借助基于图形的机器学习来分类代表缺陷-导致变化的边缘，从而证实了我们的假设。这种新的即时缺陷预测问题框架导致了明显更好的结果。我们在14个开源项目上测试了我们的方法，并显示我们的最佳模型可以预测代码更改是否会导致缺陷，F1分数高达77.55％，Matthews相关系数（MCC）高达53.16％。这代表比最先进的俯瞰式缺陷预测高152％的F1分数和3％的MCC。我们描述了局限性、面临的挑战以及如何将此方法用于实际的即时缺陷预测。

0

相关内容

缺陷预测

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

39+阅读 · 2022年10月10日

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

20+阅读 · 2022年3月7日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

114+阅读 · 2020年4月5日

【论文推荐】基于机器学习的5G网络异常检测，Machine Learning based Anomaly Detection for 5G Networks

【论文推荐】基于机器学习的5G网络异常检测，Machine Learning based Anomaly Detection for 5G Networks

专知会员服务

35+阅读 · 2020年3月12日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

53+阅读 · 2020年3月8日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具

专知会员服务

101+阅读 · 2020年3月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

158+阅读 · 2020年1月16日

【O'Reilly AI Conference 2019】使用机器学习和开源工具构建上下文AI助手（Building contextual AI assistants with machine learning and open source tools），Rasa产品经理Tyler Dunn

【O'Reilly AI Conference 2019】使用机器学习和开源工具构建上下文AI助手（Building contextual AI assistants with machine learning and open source tools），Rasa产品经理Tyler Dunn

专知会员服务

15+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

171+阅读 · 2019年10月11日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

17+阅读 · 2022年4月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

14+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

软件安全性分析的关键技术与工具

国家自然科学基金

0+阅读 · 2014年12月31日

尺度效应对含缺陷/焊接结构断裂性能和完整性评价的影响

国家自然科学基金

0+阅读 · 2013年12月31日

惯性与高阶特征辅助的图像动态环境感知方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

代码度量的缺陷预测能力的全面元分析

国家自然科学基金

0+阅读 · 2013年12月31日

面向微博平台的短文本话题检测与跟踪研究

国家自然科学基金

0+阅读 · 2013年12月31日

软件更改缺陷实时预测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

多层平板构件深层缺陷的脉冲远场涡流定量评估关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

多核平台上的BESIII离线物理软件与调度策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

利用机器学习改进统计机器翻译的研究

国家自然科学基金

0+阅读 · 2009年12月31日

SpotTarget: Rethinking the Effect of Target Edges for Link Prediction in Graph Neural Networks

Arxiv

0+阅读 · 2023年6月1日

MeROS: SysML-based Metamodel for ROS-based Systems

Arxiv

0+阅读 · 2023年6月1日

Uncertainty-Aware Unlikelihood Learning Improves Generative Aspect Sentiment Quad Prediction

Arxiv

0+阅读 · 2023年6月1日

Domain Adaptive Decision Trees: Implications for Accuracy and Fairness

Arxiv

0+阅读 · 2023年5月31日

Maximin optimal cluster randomized designs for assessing treatment effect heterogeneity

Arxiv

0+阅读 · 2023年5月30日

A Survey on Automated Driving System Testing: Landscapes and Trends

Arxiv

12+阅读 · 2022年6月13日

Automated Graph Machine Learning: Approaches, Libraries and Directions

Arxiv

20+阅读 · 2022年1月4日

Graph Neural Network for Traffic Forecasting: A Survey

Arxiv

35+阅读 · 2021年1月27日

Causality for Machine Learning

Arxiv

22+阅读 · 2019年11月24日

Interpretable machine learning: definitions, methods, and applications

Interpretable machine learning: definitions, methods, and applications

Arxiv

18+阅读 · 2019年1月14日

VIP会员

文章信息

相关主题

相关VIP内容

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

手册《兵棋推演：工具、技术和程序》33页slides，Connections UK – Wargaming for Professionals

专知会员服务

39+阅读 · 2022年10月10日

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

【MIT-ICLR2022】在机器学习模型中注入公平性, Injecting fairness into machine-learning models

专知会员服务

20+阅读 · 2022年3月7日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

114+阅读 · 2020年4月5日

【论文推荐】基于机器学习的5G网络异常检测，Machine Learning based Anomaly Detection for 5G Networks

【论文推荐】基于机器学习的5G网络异常检测，Machine Learning based Anomaly Detection for 5G Networks

专知会员服务

35+阅读 · 2020年3月12日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

53+阅读 · 2020年3月8日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具

专知会员服务

101+阅读 · 2020年3月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

158+阅读 · 2020年1月16日

【O'Reilly AI Conference 2019】使用机器学习和开源工具构建上下文AI助手（Building contextual AI assistants with machine learning and open source tools），Rasa产品经理Tyler Dunn

【O'Reilly AI Conference 2019】使用机器学习和开源工具构建上下文AI助手（Building contextual AI assistants with machine learning and open source tools），Rasa产品经理Tyler Dunn

专知会员服务

15+阅读 · 2019年11月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

171+阅读 · 2019年10月11日

热门VIP内容

相关资讯

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

【干货书】基于统计和机器学习的实用时间序列分析预测，Time Series Analysis Prediction

专知

17+阅读 · 2022年4月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

26+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

14+阅读 · 2017年11月16日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

SpotTarget: Rethinking the Effect of Target Edges for Link Prediction in Graph Neural Networks

Arxiv

0+阅读 · 2023年6月1日

MeROS: SysML-based Metamodel for ROS-based Systems

Arxiv

0+阅读 · 2023年6月1日

Uncertainty-Aware Unlikelihood Learning Improves Generative Aspect Sentiment Quad Prediction

Arxiv

0+阅读 · 2023年6月1日

Domain Adaptive Decision Trees: Implications for Accuracy and Fairness

Arxiv

0+阅读 · 2023年5月31日

Maximin optimal cluster randomized designs for assessing treatment effect heterogeneity

Arxiv

0+阅读 · 2023年5月30日

A Survey on Automated Driving System Testing: Landscapes and Trends

Arxiv

12+阅读 · 2022年6月13日

Automated Graph Machine Learning: Approaches, Libraries and Directions

Arxiv

20+阅读 · 2022年1月4日

Graph Neural Network for Traffic Forecasting: A Survey

Arxiv

35+阅读 · 2021年1月27日

Causality for Machine Learning

Arxiv

22+阅读 · 2019年11月24日

Interpretable machine learning: definitions, methods, and applications

Interpretable machine learning: definitions, methods, and applications

Arxiv

18+阅读 · 2019年1月14日

相关基金

软件安全性分析的关键技术与工具

国家自然科学基金

0+阅读 · 2014年12月31日

尺度效应对含缺陷/焊接结构断裂性能和完整性评价的影响

国家自然科学基金

0+阅读 · 2013年12月31日

惯性与高阶特征辅助的图像动态环境感知方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

代码度量的缺陷预测能力的全面元分析

国家自然科学基金

0+阅读 · 2013年12月31日

面向微博平台的短文本话题检测与跟踪研究

国家自然科学基金

0+阅读 · 2013年12月31日

软件更改缺陷实时预测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

多层平板构件深层缺陷的脉冲远场涡流定量评估关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

多核平台上的BESIII离线物理软件与调度策略研究

国家自然科学基金

0+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

利用机器学习改进统计机器翻译的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员