基于规则化Boosting算法和度量元选取技术的软件缺陷倾向预测模型

项目名称： 基于规则化Boosting算法和度量元选取技术的软件缺陷倾向预测模型

项目编号： No.61300069

项目类型： 青年科学基金项目

立项/批准年度： 2014

项目学科： 自动化技术、计算机技术

项目作者： 王世海

作者单位： 北京航空航天大学

项目金额： 23万元

中文摘要： 作为保证软件质量的重要手段之一，软件缺陷倾向预测模型的构建方法已成为研究者们关注的热点。具有很强模型构建能力的模式识别技术在于软件缺陷倾向预测领域已得到了一定的应用，但受限于软件缺陷数据固有的数据分布不平衡和输入信息（度量元）冗余的特点，现有的基于模式识别的软件缺陷倾向预测模型的性能受到很大制约。传统的非平衡数据学习算法，通过简单地人工样本添加方法进行样本扩充，对扩充样本带来的不确定类标注信息未加处理，并且不具备剔除冗余信息的能力。本项目对Boosting算法的损失函数进行理论研究，通过加入先验信息，对人工样本进行规则化学习，提出适合非平衡数据的损失函数，构造新的Boosting算法，使其能对人工扩充后的非平衡数据进行规则化学习和特征提取，最大程度上挖掘数据所含信息，提高模型精度。最终构建具有度量元选取和针对非平衡数据学习能力的性能优越的软件缺陷倾向预测模型。

中文关键词： 非平衡数据；规则化Boosting；模式识别；软件度量元选取；软件缺陷倾向预测

英文摘要： Software fault-proneness prediction is an effective approach to make a significent improvement on the qulity of software systems. Patter recogniton methods have revealed a strong modeling ability, and have been applying to software fault-proneness prediction task. BUT how to employ parttern recognition is still an open question, due to the characters of software fault data, imbalance data distribution and information redundancy. Currently,in pattern recognition area, Imbalance data learning is still an open challenge left. Several approaches have been proposed or extended to this with synthetic oversampling technique(SOTE). But, to our best knowledge, none of them take the issue of the synthetic samples with the unserness of labelling (class information) into account. There are many metrics of software have been proposed, in which there is redundant information (noises) for software fault-proneness prediction. In our project we will propose a novel Boosting cost function with introducing prior-knowledges,and building a regularized Boosting algorithm for imbalance data learning, which treats the orignal data and synthetic data separately and also has the feature selection ability. Finally the performance of model will be improved dramaticly in imbalance data learning tasks. Base on the research in this project

英文关键词： unbalanced data；regularized Boosting；software metric selection；software defect propensity prediction；pattern recognition

成为VIP会员查看完整内容