计算机应用 ›› 2015, Vol. 35 ›› Issue (5): 1306-1309.DOI: 10.11772/j.issn.1001-9081.2015.05.1306

• 人工智能 • 上一篇    下一篇

优化的支持向量机集成分类器在非平衡数据集分类中的应用

章少平, 梁雪春   

  1. 南京工业大学 自动化与电气工程学院, 南京 211816
  • 收稿日期:2014-12-16 修回日期:2015-01-14 出版日期:2015-05-10 发布日期:2015-05-14
  • 通讯作者: 梁雪春
  • 作者简介:章少平(1990-),男,江苏盐城人,硕士研究生,主要研究方向:机器学习、数据挖掘; 梁雪春(1969-),女,江苏南京人,教授,博士,主要研究方向:复杂系统预测及建模.
  • 基金资助:

    江苏省普通高校研究生科研创新计划项目(SJLX_0334);江苏省科技厅软科学项目(BR2012043).

Applications of unbalanced data classification based on optimized support vector machine ensemble classifier

ZHANG Shaoping, LIANG Xuechun   

  1. School of Automation and Electrical Engineering, Nanjing Technology University, Nanjing Jiangsu 211816, China
  • Received:2014-12-16 Revised:2015-01-14 Online:2015-05-10 Published:2015-05-14

摘要:

传统的分类算法大都建立在平衡数据集的基础上,当样本数据不平衡时,这些学习算法的性能往往会明显下降.对于非平衡数据分类问题,提出了一种优化的支持向量机(SVM)集成分类器模型,采用KSMOTE和Bootstrap对非平衡数据进行预处理,生成相应的SVM模型并用复合形算法优化模型参数,最后利用优化的参数并行生成SVM集成分类器模型,采用投票机制得到分类结果.对5组UCI标准数据集进行实验,结果表明采用优化的SVM集成分类器模型较SVM模型、优化的SVM模型等分类精度有了明显的提升,同时验证了不同的bootNum取值对分类器性能效果的影响.

关键词: 非平衡数据, 分类算法, 支持向量机, 集成分类器

Abstract:

The traditional classification algorithms are mostly based on balanced datasets. But when the sample is not balanced, the performance of these learning algorithms are often significantly decreased. For the classification of imbalanced data, a optimized Support Vector Machine (SVM) ensemble classifier model was proposed. Firstly, the model used KSMOTE and Bootstrap to preprocess the imbalanced data and paralleled to generate the corresponding SVM models. And then these SVM models' parameters were optimized by using complex method. At last the optimized SVM ensemble classifier model was generated by the above parameters and produce the final result by voting mechanism. Through the experiment on 5 groups of UCI standard data set, the experimental results show that the optimized SVM ensemble classifier model has higher classification accuracy than SVM model, optimized SVM model and so on. And the results also verify the effect of different bootNum values on the optimized SVM ensemble classifier.

Key words: unbalanced data, classification method, Support Vector Machine (SVM), ensemble classifier

中图分类号: