计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 650-654.DOI: 10.11772/j.issn.1001-9081.2017092226

• 人工智能 • 上一篇    下一篇

AdaBoost的多样性分析及改进

王玲娣, 徐华   

  1. 江南大学 物联网工程学院, 江苏 无锡 214122
  • 收稿日期:2017-09-13 修回日期:2017-10-15 出版日期:2018-03-10 发布日期:2018-03-07
  • 通讯作者: 徐华
  • 作者简介:王玲娣(1991-),女,安徽宿州人,硕士研究生,主要研究方向:机器学习、数据挖掘;徐华(1978-),女,江苏无锡人,副教授,博士,主要研究方向:计算机智能、车间调度、大数据。
  • 基金资助:
    江苏省自然科学基金资助项目(BK20140165)。

Diversity analysis and improvement of AdaBoost

WANG Lingdi, XU Hua   

  1. School of Internet of Things Engineering, Jiangnan University, Wuxi Jiangsu 214122, China
  • Received:2017-09-13 Revised:2017-10-15 Online:2018-03-10 Published:2018-03-07
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of Jiangsu Province (BK20140165).

摘要: 针对AdaBoost算法下弱分类器间的多样性如何度量问题以及AdaBoost的过适应问题,在分析并研究了4种多样性度量与AdaBoost算法的分类精度关系的基础上,提出一种基于双误度量改进的AdaBoost方法。首先,选择Q统计、相关系数、不一致度量、双误度量在UCI数据集上进行实验。然后,利用皮尔逊相关系数定量计算多样性与测试误差的相关性,发现在迭代后期阶段,它们都趋于一个稳定的值;其中双误度量在不同数据集上的变化模式固定,它在前期阶段不断增加,在迭代后期基本上不变,趋于稳定。最后,利用双误度量改进AdaBoost的弱分类器的选择策略。实验结果表明,与其他常用集成方法相比,改进后的AdaBoost算法的测试误差平均降低1.5个百分点,最高可降低4.8个百分点。因此,该算法可以进一步提高分类性能。

关键词: 多样性, AdaBoost, 集成学习, 双误度量, 弱分类器

Abstract: To solve the problem of how to measure diversity among weak classifiers created by AdaBoost as well as the over-adaptation problem of AdaBoost, an improved AdaBoost method based on double-fault measure was proposed, which was based on the analysis and study of the relationship between four diversity measures and the classification accuracy of AdaBoost. Firstly, Q statistics, correlation coefficient, disagreement measure and double-fault measure were selected for experiment on the data sets from the UCI (University of CaliforniaIrvine Irvine) machine learning repository. Then, the relationship between diversity and ensemble classifier's accuracy was evaluated with Pearson correlation coefficient. The results show that each measure tends to a stable value in the later stage of iteration; especially double-fault measure changes similarly on different data sets, increasing in the early stage and tending to be stable in the later stage of iteration. Finally, a selection strategy of weak classifier based on double-fault measure was put forward. The experimental results show that compared with the other commonly used ensemble methods, the test error of the improved AdaBoost algorithm is reduced by 1.5 percentage points in average, and 4.8 percentage points maximally. Therefore, the proposed algorithm can improve classification performance.

Key words: diversity, AdaBoost, ensemble learning, double-fault measure, weak classifier

中图分类号: