计算机应用 ›› 2021, Vol. 41 ›› Issue (8): 2225-2231.DOI: 10.11772/j.issn.1001-9081.2020101584

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

基于基分类器系数和多样性的改进AdaBoost算法

朱亮, 徐华, 崔鑫   

  1. 江南大学 人工智能与计算机学院, 江苏 无锡 214122
  • 收稿日期:2020-10-12 修回日期:2021-01-11 出版日期:2021-08-10 发布日期:2021-01-27
  • 通讯作者: 徐华
  • 作者简介:朱亮(1994-),男,安徽阜阳人,硕士研究生,CCF会员,主要研究方向:机器学习、数据挖掘;徐华(1978-),女,江苏无锡人,副教授,博士,主要研究方向:计算智能、车间调度、大数据;崔鑫(1997-),男,河南南阳人,硕士研究生,主要研究方向:数据挖掘、机器学习。

Improved AdaBoost algorithm based on base classifier coefficients and diversity

ZHU Liang, XU Hua, CUI Xin   

  1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi Jiangsu 214122, China
  • Received:2020-10-12 Revised:2021-01-11 Online:2021-08-10 Published:2021-01-27

摘要: 针对传统AdaBoost算法的基分类器线性组合效率低以及过适应的问题,提出了一种基于基分类器系数与多样性的改进算法——WD AdaBoost。首先,根据基分类器的错误率与样本权重的分布状态,给出新的基分类器系数求解方法,以提高基分类器的组合效率;其次,在基分类器的选择策略上,WD AdaBoost算法引入双误度量以增加基分类器间的多样性。在五个来自不同实际应用领域的数据集上,与传统AdaBoost算法相比,CeffAda算法使用新的基分类器系数求解方法使测试误差平均降低了1.2个百分点;同时,WD AdaBoost算法与WLDF_Ada、AD_Ada、sk_AdaBoost等算法相对比,具有更低的错误率。实验结果表明,WD AdaBoost算法能够更高效地集成基分类器,抵抗过拟合,并可以提高分类性能。

关键词: 权重, 多样性, AdaBoost, 双误度量, 分类性能

Abstract: Aiming at the low efficiency of linear combination of base classifiers and over-adaptation of the traditional AdaBoost algorithm, an improved algorithm based on coefficients and diversity of base classifiers - WD AdaBoost (AdaBoost based on Weight and Double-fault measure) was proposed. Firstly, according to the error rates of the base classifiers and the distribution status of the sample weights, a new method to solve the base classifier coefficients was given to improve the combination efficiency of the base classifiers. Secondly, the double-fault measure was introduced into WD AdaBoost algorithm in the selection strategy of base classifiers for increasing the diversity among base classifiers. On five datasets of different actual application fields, compared with the traditional AdaBoost algorithm, CeffAda algorithm uses the new base classifier coefficient solution method to make the test error reduced by 1.2 percentage points on average; meanwhile, WD AdaBoost algorithm has the lower error rate compared with WLDF_Ada, AD_Ada (Adaptive to Detection AdaBoost), sk_AdaBoost and other algorithms. Experimental results show that WD AdaBoost algorithm can integrate base classifiers more efficiently, resist overfitting, and improve the classification performance.

Key words: weight, diversity, AdaBoost, double-fault measure, classification performance

中图分类号: