计算机应用 ›› 2014, Vol. 34 ›› Issue (8): 2291-2294.DOI: 10.11772/j.issn.1001-9081.2014.08.2291

• 人工智能 • 上一篇    下一篇

结合欠抽样与集成的软件缺陷预测

李勇1,2   

  1. 1. 南京航空航天大学 计算机科学与技术学院,南京210016
    2. 新疆师范大学 网络信息安全与舆情分析重点实验室,乌鲁木齐830054
  • 收稿日期:2014-03-07 修回日期:2014-04-12 出版日期:2014-08-01 发布日期:2014-08-10
  • 通讯作者: 李勇
  • 作者简介:李勇(1983-),男,山西晋中人,讲师,博士研究生,CCF会员,主要研究方向:机器学习、软件智能。
  • 基金资助:

    新疆维吾尔自治区高校科研计划项目;教育部人文社会科学研究青年基金资助项目;国家自然科学基金资助项目;新疆师范大学重点实验室基金资助项目

Software defects prediction based on under-sampling and ensemble algorithm

LI Yong1,2   

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing Jiangsu 210016, China
    2. Key Laboratory of Network Information Security and Public Opinion Analysis, Xinjiang Normal University, Urumqi Xinjiang 830054, China;
  • Received:2014-03-07 Revised:2014-04-12 Online:2014-08-01 Published:2014-08-10
  • Contact: LI Yong

摘要:

软件缺陷预测是提高测试效率、保证软件可靠性的重要途径。为了提高软件缺陷预测的准确率,提出一种结合欠抽样与决策树分类器集成的软件缺陷预测模型。考虑到软件缺陷数据的类不平衡特性,首先,通过数据的不平衡率确定抽样度,执行欠抽样实现数据的重新平衡;然后,采用Bagging随机抽样原理训练若干个决策树子分类器;最后,按照少数服从多数的原则生成预测模型。使用公开的NASA软件缺陷预测数据集进行了仿真实验。实验结果表明,与3种基准方法对比,所提模型在保证预报率的前提下,误报率(PF)降低了10%以上,综合评价指标均有显著提升。该模型的缺陷预测误报率较低,而且具有较高的预测准确率与稳定性。

Abstract:

Software defects prediction is considered as a means for the improvement of test efficiency and assurance of software reliability. To improve the accuracy of software defect prediction, a model based on under-sampling and decision tree ensemble algorithm was proposed. Firstly, taking into account class imbalance of software defect data, the random under-sampling technique was used to rebalance the data according to the imbalance rate. Then, several decision tree sub-classifiers were trained by using Bagging's random sampling. Finally, the defect prediction model was constructed based on majority rule. The experiments were carried out on the NASA MDP datasets. The experimental results show that, compared with three standard methods, the Probability of False alarm (PF) of the proposed model is reduced by 10% while ensuring probability of detection and the comprehensive evaluation index is improved significantly. It has low PF of defect prediction, and it is more effective and stable in software defects prediction practices.

中图分类号: