《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (7): 2022-2029.DOI: 10.11772/j.issn.1001-9081.2021050726

• 人工智能 • 上一篇    

AdaBoost的样本权重与组合系数的分析及改进

朱亮, 徐华(), 成金海, 朱深   

  1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
  • 收稿日期:2021-05-08 修回日期:2022-02-10 接受日期:2022-02-18 发布日期:2022-03-08 出版日期:2022-07-10
  • 通讯作者: 徐华
  • 作者简介:朱亮(1994—),男,安徽阜阳人,硕士研究生,CCF会员,主要研究方向:机器学习、数据挖掘
    成金海(1997—),男,江苏南通人,硕士研究生,主要研究方向:数据挖掘、机器学习、嵌入式软件
    朱深(1997—),男,河南周口人,硕士研究生,主要研究方向:数据挖掘、机器学习。

Analysis and improvement of AdaBoost’s sample weight and combination coefficient

Liang ZHU, Hua XU(), Jinhai CHENG, Shen ZHU   

  1. School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi Jiangsu 214122,China
  • Received:2021-05-08 Revised:2022-02-10 Accepted:2022-02-18 Online:2022-03-08 Published:2022-07-10
  • Contact: Hua XU
  • About author:ZHU Liang, born in 1994, M. S. candidate. His research interests include machine learning, data mining.
    CHENG Jinhai, born in 1997, M. S. candidate. His research interests include data mining, machine learning, embedded software.
    ZHU Shen, born in 1997, M. S. candidate. His research interests include data mining, machine learning.

摘要:

针对自适应增强(AdaBoost)算法的基分类器线性组合效率低以及过度关注难分样本的问题,提出了基于间隔理论的两种改进算法WPIAda与WPIAda.M。首先,WPIAda与WPIAda.M算法都将样本权值的更新分为四种情形,从而增加间隔从正到负变化的样本权值来抑制间隔的负向移动,并减少间隔处于零点的样本数量;其次,WPIAda.M算法根据基分类器的错误率与样本权重的分布状态,给出新的基分类器系数求解方法,从而提高基分类器的组合效率。在10个UCI数据集上,与dfAda、skAda、swaAda等算法相比,WPIAda和WPIAda.M算法的测试误差分别平均降低了7.46个百分点和7.64个百分点;AUC分别提高了11.65个百分点和11.92个百分点。实验结果表明,WPIAda和WPIAda.M算法可以有效降低对难分样本的关注,并且WPIAda.M算法能够更高效地集成基分类器,因此两种算法均可进一步提高分类性能。

关键词: 自适应增强, 间隔理论, 样本权重, 基分类器, 组合效率

Abstract:

Aiming at the problems of low linear combination efficiency and too much attention to hard examples of the base classifiers of Adjusts Adaptive Boosting (AdaBoost) algorithm, two improved algorithms based on margin theory, named sample Weight and Parameterization of Improved AdaBoost (WPIAda) and sample Weight and Parameterization of Improved AdaBoost-Multitude (WPIAda.M), were proposed. Firstly, the updates of sample weights were divided into four situations by both WPIAda and WPIAda.M algorithms, which increased the sample weights with the margin changing from positive to negative to suppress the negative movement of the margin and reduce the number of samples with the margin at zero. Secondly, according to the error rates of the base classifiers and the distribution of the sample weights, a new method to solve the coefficients of base classifiers was given by WPIAda.M algorithm, thereby improving the combination efficiency of base classifiers. On 10 UCI datasets, compared with algorithms such as WLDF_Ada (dfAda), skAda, SWA-Adaboost (swaAda), WPIAda and WPIAda.M algorithms had the test error reduced by 7.46 percentage points and 7.64 percentage points on average respectively, and the Area Under Curve (AUC) increased by 11.65 percentage points and 11.92 percentage points respectively. Experimental results show that WPIAda and WPIAda.M algorithms can effectively reduce the attention to hard examples, and WPIAda.M algorithm can integrate base classifiers more efficiently, so that the two algorithms can both further improve the classification performance.

Key words: Adjusts Adaptive Boosting (AdaBoost), margin theory, sample weight, base classifier, combination efficiency

中图分类号: