Background: Machine Learning (ML) methods are being increasingly used for automating different activities, e.g., Test Case Prioritization (TCP), of Continuous Integration (CI). However, ML models need frequent retraining as a result of changes in the CI environment, more commonly known as data drift. Also, continuously retraining ML models consume a lot of time and effort. Hence, there is an urgent need of identifying and evaluating suitable approaches that can help in reducing the retraining efforts and time for ML models used for TCP in CI environments. Aims: This study aims to investigate the performance of using data drift detection techniques for automatically detecting the retraining points for ML models for TCP in CI environments without requiring detailed knowledge of the software projects. Method: We employed the Hellinger distance to identify changes in both the values and distribution of input data and leveraged these changes as retraining points for the ML model. We evaluated the efficacy of this method on multiple datasets and compared the APFDc and NAPFD evaluation metrics against models that were regularly retrained, with careful consideration of the statistical methods. Results: Our experimental evaluation of the Hellinger distance-based method demonstrated its efficacy and efficiency in detecting retraining points and reducing the associated costs. However, the performance of this method may vary depending on the dataset. Conclusions: Our findings suggest that data drift detection methods can assist in identifying retraining points for ML models in CI environments, while significantly reducing the required retraining time. These methods can be helpful for practitioners who lack specialized knowledge of software projects, enabling them to maintain ML model accuracy.


翻译:暂无翻译

0
下载
关闭预览

相关内容

机器学习入门的经验与建议
专知会员服务
94+阅读 · 2019年10月10日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
105+阅读 · 2019年10月9日
Hierarchically Structured Meta-learning
CreateAMind
27+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
29+阅读 · 2019年5月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
18+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
43+阅读 · 2019年1月3日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
18+阅读 · 2018年12月24日
国家自然科学基金
0+阅读 · 2012年12月31日
国家自然科学基金
0+阅读 · 2012年12月31日
GeomCA: Geometric Evaluation of Data Representations
Arxiv
11+阅读 · 2021年5月26日
Arxiv
15+阅读 · 2020年12月17日
Arxiv
112+阅读 · 2020年2月5日
VIP会员
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
27+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
29+阅读 · 2019年5月18日
强化学习的Unsupervised Meta-Learning
CreateAMind
18+阅读 · 2019年1月7日
Unsupervised Learning via Meta-Learning
CreateAMind
43+阅读 · 2019年1月3日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
18+阅读 · 2018年12月24日
相关论文
GeomCA: Geometric Evaluation of Data Representations
Arxiv
11+阅读 · 2021年5月26日
Arxiv
15+阅读 · 2020年12月17日
Arxiv
112+阅读 · 2020年2月5日
相关基金
Top
微信扫码咨询专知VIP会员