回归算法对软件缺陷个数预测模型性能的影响研究

计算机应用

• 人工智能与仿真 • 下一篇

回归算法对软件缺陷个数预测模型性能的影响研究

付忠旺¹,余啸²,肖蓉¹,谷懿³

1. 湖北大学计算机与信息工程学院
2. 2武汉大学软件工程国家重点实验室
3. 武汉大学软件工程国家重点实验室

收稿日期:2017-08-07 修回日期:2017-09-22 发布日期:2017-09-22 出版日期:2017-10-18
通讯作者: 余啸
作者简介:付忠旺(1993—),男,山东聊城人,硕士研究生,主要研究方向：数据挖掘、软件工程; 余啸(1994—),男,湖北汉川人,博士研究生,主要研究方向：软件工程、深度学习; 肖蓉(1980—),女,湖北宜昌人,讲师,主要研究方向：软件工程; 谷懿(1996—)，男,云南大理人,本科生,主要研究方向：机器学习。

Impact study of regression algorithms on the performance of the model for predicting the number of defects

Received:2017-08-07 Revised:2017-09-22 Online:2017-09-22 Published:2017-10-18
About author:Fu Zhongwang, born in 1993, M. S. candidate. His research interests include data mining and software engineering. Yu Xiao, born in 1994, Ph. D. candidate. His research interests include software engineering and deep learning. Xiao Rong, born in 1980, Lecturer. Her research interests include software engineering. Gu Yi, born in 1996, undergraduate. His research interests include machine learning.

摘要/Abstract

摘要：

针对已有研究在评价软件缺陷个数预测模型性能时没有考虑到软件缺陷数据集存在数据不平衡的问题而采用了评估回归模型的不合适的评价指标的问题，提出以平均缺陷百分比作为评价指标，讨论不同回归算法对软件缺陷个数预测模型性能的影响程度。利用PROMISE提供的6个开源数据集，分析了10个回归算法对软件缺陷个数预测模型预测结果的影响以及各种回归算法之间的差异。研究结果表明：使用不同的回归算法建立的软件缺陷个数预测模型具有不同的预测效果，其中梯度Boosting回归算法和贝叶斯岭回归算法预测效果更好。

关键词: 软件缺陷个数预测, 数据不平衡, 回归算法

Abstract:

Focusing on the issue that the existing studies did not consider the imbalanced data distribution problem in defect datasets and employed improper performance measures for evaluating the regression models to evaluate the performance of models for predicting the number of defects, the impact on models for predicting the number of defects of different regression algorithms were explored by using Fault-Percentile-Average (FPA) as the performance measure. Experiments were conducted on six datasets from PROMISE repository to analyze the impact on the models and the difference of ten regression algorithms for predicting the number of defects. The results show that the forecast result of models for predicting the number of defects built by different regression algorithms are various, and Gradient Boosting Regression algorithm and Bayesian Ridge Regression algorithm can achieve better performance as a whole．

Key words: prediction of the number of defects, imbalanced data distribution, regression algorithm

中图分类号:

TP181

付忠旺余啸肖蓉谷懿. 回归算法对软件缺陷个数预测模型性能的影响研究[J]. 计算机应用.

[1]	肖斌, 甘昀, 汪敏, 张兴鹏, 王照星. 基于端口注意力与通道空间注意力的网络异常流量检测[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1027-1034.
[2]	杨宏宇, 李博超. 基于逆向习得推理的网络异常行为检测模型[J]. 计算机应用, 2019, 39(7): 1967-1972.
[3]	张寓, 於东军. 基于一维卷积神经网络的蛋白质-ATP绑定位点预测[J]. 计算机应用, 2019, 39(11): 3146-3150.
[4]	简艺恒, 余啸. 基于数据过采样和集成学习的软件缺陷数目预测方法[J]. 计算机应用, 2018, 38(9): 2637-2643.
[5]	付忠旺, 肖蓉, 余啸, 谷懿. 回归算法对软件缺陷个数预测模型性能的影响[J]. 计算机应用, 2018, 38(3): 824-828.
[6]	王春荣, 夏尔冬, 吴龙, 刘建军, 熊昌炯. 基于改进支持向量回归算法的移动机器人定位[J]. 计算机应用, 2016, 36(9): 2545-2549.
[7]	吴德会;Dehui Wu . 一种基于LS-SVM的特征提取新方法及其在智能质量控制中的应用[J]. 计算机应用, 2006, 26(10): 2446-2449.

回归算法对软件缺陷个数预测模型性能的影响研究

Impact study of regression algorithms on the performance of the model for predicting the number of defects

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 7

编辑推荐

Metrics