简化软件缺陷预测(通过“早期鸟类”超常) (Simplifying Software Defect Prediction (via the "early bird" Heuristic)) - 专知论文

会员服务 ·

0

INFORMS · MoDELS · SimPLe · Better · state-of-the-art ·

2021 年 12 月 31 日

Simplifying Software Defect Prediction (via the "early bird" Heuristic)

翻译：简化软件缺陷预测(通过“早期鸟类”超常)

N. C. Shrikanth,Tim Menzies

from arxiv, 41 pages (Under Review)

Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, then perhaps a model learned from that region would suffice for the rest of the project. To support this claim, we offer a case study with 240 GitHub projects, where we find that the information in those projects "clumped" towards the earliest parts of the project. A defect prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this early life cycle data, we can build models very quickly, very early in the software project life cycle. Moreover, using this method, we have shown that a simple model (with just two features) generalizes to hundreds of software projects. Based on this experience, we doubt that prior work on generalizing software engineering defect prediction models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data and scripts are online at https://github.com/snaraya7/simplifying-software-analytics

翻译：在研究人员匆忙地对所有现有数据进行解释或尝试复杂的方法之前,也许明智的做法是首先检查更简单的替代方法。具体地说,如果历史数据在某些小区域拥有最多的信息,那么也许一个从该区域学到的模型就足以满足项目的其余部分。为了支持这一主张,我们提供了240个GitHub项目的案例研究,我们发现这些项目中的信息“挤压”到项目的最初部分。仅仅从最初的150个项目中获得的缺陷预测模型也投入了工作,或者比最先进的替代方法更好。仅仅使用这一早期生命周期数据,我们就可以在软件项目生命周期中非常快地、非常早地建立模型。此外,我们用这种方法表明,一个简单的模型(只有两个特点)可以概括成百多个软件项目。根据这一经验,我们怀疑以前关于一般软件工程缺陷预测模型的工作可能毫无必要地复杂一个内在的简单过程。此外,以前侧重于后期周期数据的工作需要重新审视,因为其结论来自相对不具有说服力的区域。Revicing:我们所有的数据和脚本都在网上进行 http://smagistrustamasistry/slistrucal。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

miR34a与PI3KⅢ通路crosstalk调控成骨在补肾法治疗激素性骨质疏松症中的机制

国家自然科学基金

0+阅读 · 2015年12月31日

菜用大豆籽粒硬度相关位点关联分析与功能验证

国家自然科学基金

0+阅读 · 2014年12月31日

基于多智能体的GIS成矿预测模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

GP73+AFP+HCC/DC杂交细胞诱生特异性抗人肝癌CTL疫苗的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

白蚁肠道内木质纤维素降解的分子生态学解析

国家自然科学基金

0+阅读 · 2011年12月31日

大气气溶胶的有机组成及其对多环芳烃和硝基多环芳烃气-粒分配的影响

国家自然科学基金

0+阅读 · 2009年12月31日

基于Treelet变换的多时相SAR图像变化检测

国家自然科学基金

0+阅读 · 2009年12月31日

基于RFID/EPC技术的网络化单件生产实时监控

国家自然科学基金

0+阅读 · 2009年12月31日

磷酸化修饰介导的蛋白质相互作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

Utilizing unsupervised learning to improve sward content prediction and herbage mass estimation

Arxiv

0+阅读 · 2022年4月20日

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

Arxiv

0+阅读 · 2022年4月17日

Non-Elitist Selection among Survivor Configurations can Improve the Performance of Irace

Arxiv

0+阅读 · 2022年4月17日

Interdependent Public Projects

Arxiv

0+阅读 · 2022年4月17日

ZeroIn: Characterizing the Data Distributions of Commits in Software Repositories

Arxiv

0+阅读 · 2022年4月16日

The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants

The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants

Arxiv

0+阅读 · 2022年4月15日

Is Surprisal in Issue Trackers Actionable?

Arxiv

0+阅读 · 2022年4月15日

What is Event Knowledge Graph: A Survey

Arxiv

33+阅读 · 2021年12月31日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Utilizing unsupervised learning to improve sward content prediction and herbage mass estimation

Arxiv

0+阅读 · 2022年4月20日

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

Arxiv

0+阅读 · 2022年4月17日

Non-Elitist Selection among Survivor Configurations can Improve the Performance of Irace

Arxiv

0+阅读 · 2022年4月17日

Interdependent Public Projects

Arxiv

0+阅读 · 2022年4月17日

ZeroIn: Characterizing the Data Distributions of Commits in Software Repositories

Arxiv

0+阅读 · 2022年4月16日

The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants

The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants

Arxiv

0+阅读 · 2022年4月15日

Is Surprisal in Issue Trackers Actionable?

Arxiv

0+阅读 · 2022年4月15日

What is Event Knowledge Graph: A Survey

Arxiv

33+阅读 · 2021年12月31日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Arxiv

11+阅读 · 2018年1月11日

相关基金

miR34a与PI3KⅢ通路crosstalk调控成骨在补肾法治疗激素性骨质疏松症中的机制

国家自然科学基金

0+阅读 · 2015年12月31日

菜用大豆籽粒硬度相关位点关联分析与功能验证

国家自然科学基金

0+阅读 · 2014年12月31日

基于多智能体的GIS成矿预测模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

GP73+AFP+HCC/DC杂交细胞诱生特异性抗人肝癌CTL疫苗的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cystatin B缺失与Prion疾病自噬作用机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

白蚁肠道内木质纤维素降解的分子生态学解析

国家自然科学基金

0+阅读 · 2011年12月31日

大气气溶胶的有机组成及其对多环芳烃和硝基多环芳烃气-粒分配的影响

国家自然科学基金

0+阅读 · 2009年12月31日

基于Treelet变换的多时相SAR图像变化检测

国家自然科学基金

0+阅读 · 2009年12月31日

基于RFID/EPC技术的网络化单件生产实时监控

国家自然科学基金

0+阅读 · 2009年12月31日

磷酸化修饰介导的蛋白质相互作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员