计算机应用 ›› 2017, Vol. 37 ›› Issue (4): 1100-1104.DOI: 10.11772/j.issn.1001-9081.2017.04.1100

• 人工智能 • 上一篇    下一篇

基于实例的强分类器快速集成方法

许业旺1, 王永利1, 赵忠文2   

  1. 1. 南京理工大学 计算机科学与工程学院, 南京 210094;
    2. 装备学院 复杂电子系统仿真重点实验室, 北京 101416
  • 收稿日期:2016-07-29 修回日期:2016-09-28 出版日期:2017-04-10 发布日期:2017-04-19
  • 通讯作者: 许业旺
  • 作者简介:许业旺(1991-),男,江苏淮安人,硕士研究生,主要研究方向:数据挖掘、大数据信息安全;王永利(1974-),男,哈尔滨佳木斯人,教授,博士,主要研究方向:数据库、数据挖掘、大数据处理、智能服务、云计算;赵忠文(1974-),男,北京人,副教授,硕士,主要研究方向:信息系统、多维信息、态势综合。
  • 基金资助:
    国家自然科学基金资助项目(61170035,61272420,61502233);江苏省科技成果转化专项资金资助项目(BA2013047);江苏省六大人才高峰项目(WLW-004);国防科技重点实验室基础研究项目(DXZT-JC-ZZ-2013-019);兵科院预研项目(62201070151);中央高校基本科研业务费专项资金资助项目(30916011328)。

Fast ensemble method for strong classifiers based on instance

XU Yewang1, WANG Yongli1, ZHAO Zhongwen2   

  1. 1. Department of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing Jiangsu 210094, China;
    2. National Key Laboratory of Complex Electronic System Simulation, Academy of Equipment, Beijing 101416, China
  • Received:2016-07-29 Revised:2016-09-28 Online:2017-04-10 Published:2017-04-19
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61170035, 61272420, 61502233), the Jiangsu Province Science and Technology Achievement Transformation Projects of Special Funds (BA2013047), the Jiangsu Province Six Talent Peaks Project (WLW-004), the Nnational Defense Science and Technology Key Laboratory of Basic Research Projects (DXZT-JC-ZZ-2013-019), the Military Academy of Pre-Research Project (62201070151), the Fundamental Research Funds for the Central Universities (30916011328).

摘要: 针对集成分类器由于基分类器过弱,需要牺牲大量训练时间才能取得高精度的问题,提出一种基于实例的强分类器快速集成方法——FSE。首先通过基分类器评价方法剔除不合格分类器,再对分类器进行精确度和差异性排序,从而得到一组精度最高、差异性最大的分类器;然后通过FSE集成算法打破已有的样本分布,重新采样使分类器更多地关注难学习的样本,并以此决定各分类器的权重并集成。实验通过与集成分类器Boosting在UCI数据库和真实数据集上进行比对,Boosting构造的集成分类器的识别精度最高分别能达到90.2%和90.4%,而使用FSE方法的集成分类器精度分别能达到95.6%和93.9%;而且两者在达到相同精度时,使用FSE方法的集成分类器分别缩短了75%和80%的训练时间。实验结果表明,FSE集成模型能有效提高识别精度、缩短训练时间。

关键词: 强分类器集成模型, 基分类器评价方法, 集成算法, 样本分布, 集成学习

Abstract: Focusing on the issue that the ensemble classifier based on weak classifiers needs to sacrifice a lot of training time to obtain high precision, an ensemble method of strong classifiers based on instances named Fast Strong-classifiers Ensemble (FSE) was proposed. Firstly, the evaluation method was used to eliminate substandard classifier and order the restclassifiers by the accuracy and diversity to obtain a set of classifiers with highest precision and maximal difference. Secondly, the FSE algorithm was used to break the existing sample distribution, to re-sample and make the classifier pay more attention to learn the difficult samples. Finally, the ensemble classifier was completed by determining the weight of each classifier simultaneously. The experiments were conducted on UCI dataset and customized dataset. The accuracy of the Boosting reached 90.2% and 90.4% on both datasets respectively, and the accuracy of the FSE reached 95.6% and 93.9%. The training time of ensemble classifier with FSE was shortened by 75% and 80% compared to the ensemble classifier with Boosting when they reached the same accuracy. The theoretical analysis and simulation results show that FSE ensemble model can effectively improve the recognition accuracy and shorten training time.

Key words: strong classifiers ensemble model, base classifier evaluation method, ensemble algorithm, sample distribution, ensemble learning

中图分类号: