计算机应用 ›› 2017, Vol. 37 ›› Issue (6): 1692-1696.DOI: 10.11772/j.issn.1001-9081.2017.06.1692

• 人工智能 • 上一篇    下一篇

基于多类指数损失函数逐步添加模型的改进多分类AdaBoost算法

翟夕阳1, 王晓丹1, 雷蕾1, 魏晓辉2   

  1. 1. 空军工程大学 防空反导学院, 西安 710051;
    2. 解放军第463医院, 沈阳 110042
  • 收稿日期:2016-11-21 修回日期:2017-01-10 出版日期:2017-06-10 发布日期:2017-06-14
  • 通讯作者: 王晓丹
  • 作者简介:翟夕阳(1994-),男,河南开封人,硕士研究生,主要研究方向:智能信息处理、模式识别;王晓丹(1966-),女,陕西汉中人,教授,博士,主要研究方向:智能信息处理、机器学习;雷蕾(1988-),女,四川南充人,博士研究生,主要研究方向:智能信息处理、模式识别;魏晓辉(1966-),男,辽宁沈阳人,高级工程师,主要研究方向:计算机信息系统。
  • 基金资助:
    国家自然科学基金资助项目(61273275,61503407)。

Improved multi-class AdaBoost algorithm based on stagewise additive modeling using a multi-class exponential loss function

ZHAI Xiyang1, WANG Xiaodan1, LEI Lei1, WEI Xiaohui2   

  1. 1. Institute of Air Defense and Anti-Missile, Air Force Engineering University, Xi'an Shaanxi 710051, China;
    2. Hospital 463 of PLA, Shenyang Liaoning 110042, China
  • Received:2016-11-21 Revised:2017-01-10 Online:2017-06-10 Published:2017-06-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61273275, 61503407).

摘要: 多类指数损失函数逐步添加模型(SAMME)是一种多分类的AdaBoost算法,为进一步提升SAMME算法的性能,针对使用加权概率和伪损失对算法的影响进行研究,在此基础上提出了一种基于基分类器对样本有效邻域分类的动态加权AdaBoost算法SAMME.RD。首先,确定是否使用加权概率和伪损失;然后,求出待测样本在训练集中的有效邻域;最后,根据基分类器针对有效邻域的分类结果确定基分类器的加权系数。使用UCI数据集进行验证,实验结果表明:使用真实的错误率计算基分类器加权系数效果更好;在数据类别较少且分布平衡时,使用真实概率进行基分类器筛选效果较好;在数据类别较多且分布不平衡时,使用加权概率进行基分类器筛选效果较好。所提的SAMME.RD算法可以有效提高多分类AdaBoost算法的分类正确率。

关键词: 集成学习, 多分类, AdaBoost算法, 多类指数损失函数逐步添加模型(SAMME), 动态加权融合

Abstract: Stagewise Additive Modeling using a Multi-class Exponential loss function (SAMME) is a multi-class AdaBoost algorithm. To further improve the performance of SAMME, the influence of using weighed error rate and pseudo loss on SAMME algorithm was studied, and a dynamic weighted Adaptive Boosting (AdaBoost) algorithm named SAMME with Resampling and Dynamic weighting (SAMME.RD) algorithm was proposed based on the classification of sample's effective neighborhood area by using the base classifier. Firstly, it was determined that whether to use weighted probability and pseudo loss or not. Then, the effective neighborhood area of sample to be tested in the training set was found out. Finally, the weighted coefficient of the base classifier was determined according to the classification result of the effective neighborhood area based on the base classifier. The experimental results show that, the effect of calculating the weighted coefficient of the base classifier by using real error rate is better. The performance of selecting base classifier by using real probability is better when the dataset has less classes and its distribution is balanced. The performance of selecting base classifier by using weighed probability is better when the dataset has more classes and its distribution is imbalanced. The proposed SAMME.RD algorithm can improve the multi-class classification accuracy of AdaBoost algorithm effectively.

Key words: ensemble learning, multi-class, Adaptive Boosting(AdaBoost) algorithm, Stagewise Additive Modeling using a Multi-class Exponential loss function (SAMME), dynamic weighted fusion

中图分类号: