计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 824-828.DOI: 10.11772/j.issn.1001-9081.2017081935

• 计算机软件技术 • 上一篇    下一篇

回归算法对软件缺陷个数预测模型性能的影响

付忠旺1,2,3, 肖蓉1,2, 余啸2, 谷懿1   

  1. 1. 湖北大学 计算机与信息工程学院, 武汉 430062;
    2. 软件工程国家重点实验室(武汉大学), 武汉 430072;
    3. 湖北省教育信息化工程技术研究中心, 武汉 430062
  • 收稿日期:2017-08-07 修回日期:2017-09-22 出版日期:2018-03-10 发布日期:2018-03-07
  • 通讯作者: 付忠旺
  • 作者简介:付忠旺(1993-),男,山东聊城人,硕士研究生,主要研究方向:数据挖掘、软件工程;肖蓉(1980-),女,湖北宜昌人,讲师,博士研究生,主要研究方向:软件工程;余啸(1994-),男,湖北汉川人,博士研究生,主要研究方向:软件工程、深度学习;谷懿(1996-),男,云南大理人,主要研究方向:机器学习。

Impact of regression algorithms on performance of defect number prediction model

FU Zhongwang1,2,3, XIAO Rong1,2, YU Xiao2, GU Yi1   

  1. 1. School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China;
    2. State Key Laboratory of Software Engineering(Wuhan University), Wuhan Hubei 430072, China;
    3. Educational Informationalization Engineering Research Center of HuBei Province, Wuhan Hubei 430062, China
  • Received:2017-08-07 Revised:2017-09-22 Online:2018-03-10 Published:2018-03-07

摘要: 针对已有研究在评价软件缺陷个数预测模型性能时没有考虑到软件缺陷数据集存在数据不平衡的问题而采用了评估回归模型的不合适的评价指标的问题,提出以平均缺陷百分比作为评价指标,讨论不同回归算法对软件缺陷个数预测模型性能的影响程度。利用PROMISE提供的6个开源数据集,分析了10个回归算法对软件缺陷个数预测模型预测结果的影响以及各种回归算法之间的差异。研究结果表明:使用不同的回归算法建立的软件缺陷个数预测模型具有不同的预测效果,其中梯度Boosting回归算法和贝叶斯岭回归算法预测效果更好。

关键词: 软件缺陷个数预测, 数据不平衡, 回归算法

Abstract: Focusing on the issue that the existing studies do not consider the imbalanced data distribution problem in defect datasets and employ improper performance measures to evaluate the performance of regression models for predicting the number of defects, the impact of different regression algorithms on models for predicting the number of defects were explored by using Fault-Percentile-Average (FPA) as the performance measure. Experiments were conducted on six datasets from PROMISE repository to analyze the impact on the models and the difference of ten regression algorithms for predicting the number of defects. The results show that the forecast results of models for predicting the number of defects built by different regression algorithms are various, and gradient boosting regression algorithm and Bayesian ridge regression algorithm can achieve better performance as a whole.

Key words: defect number prediction, imbalanced data distribution, regression algorithm

中图分类号: