AdaBoost的多样性分析及改进

doi:10.11772/j.issn.1001-9081.2017092226

计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 650-654.DOI: 10.11772/j.issn.1001-9081.2017092226

AdaBoost的多样性分析及改进

王玲娣, 徐华

江南大学物联网工程学院, 江苏无锡 214122

收稿日期:2017-09-13 修回日期:2017-10-15 发布日期:2018-03-07 出版日期:2018-03-10
通讯作者: 徐华
作者简介:王玲娣(1991-),女,安徽宿州人,硕士研究生,主要研究方向:机器学习、数据挖掘;徐华(1978-),女,江苏无锡人,副教授,博士,主要研究方向:计算机智能、车间调度、大数据。
基金资助:
江苏省自然科学基金资助项目（BK20140165）。

Diversity analysis and improvement of AdaBoost

WANG Lingdi, XU Hua

School of Internet of Things Engineering, Jiangnan University, Wuxi Jiangsu 214122, China

Received:2017-09-13 Revised:2017-10-15 Online:2018-03-07 Published:2018-03-10
Supported by:
This work is partially supported by the National Natural Science Foundation of Jiangsu Province (BK20140165).

摘要/Abstract

摘要： 针对AdaBoost算法下弱分类器间的多样性如何度量问题以及AdaBoost的过适应问题，在分析并研究了4种多样性度量与AdaBoost算法的分类精度关系的基础上，提出一种基于双误度量改进的AdaBoost方法。首先，选择Q统计、相关系数、不一致度量、双误度量在UCI数据集上进行实验。然后，利用皮尔逊相关系数定量计算多样性与测试误差的相关性，发现在迭代后期阶段，它们都趋于一个稳定的值；其中双误度量在不同数据集上的变化模式固定，它在前期阶段不断增加，在迭代后期基本上不变，趋于稳定。最后，利用双误度量改进AdaBoost的弱分类器的选择策略。实验结果表明，与其他常用集成方法相比，改进后的AdaBoost算法的测试误差平均降低1.5个百分点，最高可降低4.8个百分点。因此，该算法可以进一步提高分类性能。

关键词: 多样性, AdaBoost, 集成学习, 双误度量, 弱分类器

Abstract: To solve the problem of how to measure diversity among weak classifiers created by AdaBoost as well as the over-adaptation problem of AdaBoost, an improved AdaBoost method based on double-fault measure was proposed, which was based on the analysis and study of the relationship between four diversity measures and the classification accuracy of AdaBoost. Firstly, Q statistics, correlation coefficient, disagreement measure and double-fault measure were selected for experiment on the data sets from the UCI (University of CaliforniaIrvine Irvine) machine learning repository. Then, the relationship between diversity and ensemble classifier's accuracy was evaluated with Pearson correlation coefficient. The results show that each measure tends to a stable value in the later stage of iteration; especially double-fault measure changes similarly on different data sets, increasing in the early stage and tending to be stable in the later stage of iteration. Finally, a selection strategy of weak classifier based on double-fault measure was put forward. The experimental results show that compared with the other commonly used ensemble methods, the test error of the improved AdaBoost algorithm is reduced by 1.5 percentage points in average, and 4.8 percentage points maximally. Therefore, the proposed algorithm can improve classification performance.

Key words: diversity, AdaBoost, ensemble learning, double-fault measure, weak classifier

中图分类号:

TP181

王玲娣, 徐华. AdaBoost的多样性分析及改进[J]. 计算机应用, 2018, 38(3): 650-654.

WANG Lingdi, XU Hua. Diversity analysis and improvement of AdaBoost[J]. Journal of Computer Applications, 2018, 38(3): 650-654.

参考文献

[1] SCHAPIRE R E. The strength of weak learnability[J]. Machine Learning, 1990, 5(2):197-227.
[2] BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140.
[3] SCHAPIRE R E, SINGER Y. Improved boosting algorithms using confidence-rated predictions[J]. Machine Learning, 1999, 37(3):297-336.
[4] MORENO P J, LOGAN B, RAJ B. A boosting approach for confidence scoring[EB/OL].[2017-03-06]. http://www.mirrorservice.org/sites/www.bitsavers.org/pdf/dec/tech_reports/CRL-2001-8.pdf.
[5] 廖广军,李致富,刘屿,等.基于深度信息的弱光条件下人脸检测[J].控制与决策,2014,29(10):1866-1870.(LIAO G J, LI Z F, LIU Y, et al. Human face detection under weak light based on depth information[J]. Control and Decision, 2014, 29(10):1866-1870.)
[6] PIAO Y, PIAO M, RYU K H. Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles[J]. Computers in Biology & Medicine, 2017, 80:39-44.
[7] KIM B, YU S C. Imaging sonar based real-time underwater object detection utilizing AdaBoost method[C]//UT 2017:Proceedings of the 2017 IEEE Underwater Technology. Piscataway, NJ:IEEE, 2017:1-5.
[8] 李文辉,倪洪印.一种改进的AdaBoost训练算法[J].吉林大学学报(理学版),2011,49(3):498-504.(LI W H, NI H Y. An improved AdaBoost training algorithm[J]. Journal of Jilin University (Science Edition), 2011, 49(3):498-504.)
[9] KROGH B A, VEDELSBY J. Neural network ensembles, cross validation, and active learning[J]. Advances in Neural Information Processing Systems, 1994, 7(10):231-238.
[10] KUNCHEVA L I. That elusive diversity in classifier ensembles[C]//Proceedings of the 1st Iberian Conference on Pattern Recognition and Image Analysis, LNCS 2652. Berlin:Springer, 2003:1126-1138.
[11] 孙博,王建东,陈海燕,等.集成学习中的多样性度量[J].控制与决策,2014,29(3):385-395.(SUN B, WANG J D, CHEN H Y, et al. Diversity measures in ensemble learning[J]. Control and Decision, 2014, 29(3):385-395.)
[12] CAVALCANTI G D C, OLIVEIRA L S, MOURA T J M, et al. Combining diversity measures for ensemble pruning[J]. Pattern Recognition Letters, 2016, 74(C):38-45.
[13] 杨春,殷绪成, 郝红卫,等.基于差异性的分类器集成:有效性分析及优化集成[J].自动化学报,2014, 40(4):660-674.(YANG C, YIN X C, HAO H W, et al. Classifier ensemble with diversity:effectiveness analysis and ensemble optimization[J]. Acta Automatica Sinica, 2014, 40(4):660-674.)
[14] PARVIN H, MIRNABIBABOLI M, ALINEJAD-ROKNY H. Proposing a classifier ensemble framework based on classifier selection and decision tree[J]. Engineering Applications of Artificial Intelligence, 2015, 37:34-42.
[15] LI N, YU Y, ZHOU Z H. Diversity regularized ensemble pruning[C]//Proceedings of the 2012 Joint European Conference on Machine Learning and Knowledge Discovery in Databases, LNCS 7523. Berlin:Springer, 2012:330-345.
[16] 姚旭,王晓丹,张玉玺,等.基于随机子空间和AdaBoost的自适应集成方法[J].电子学报,2013,41(4):810-814.(YAO X, WANG X D, ZHANG Y X, et al. A self-adaption ensemble algorithm based on random subspace and AdaBoost[J]. Acta Electronica Sinica, 2013, 41(4):810-814.)
[17] 曹莹,苗启广,刘家辰,等.AdaBoost算法研究进展与展望[J].自动化学报,2013,39(6):745-758.(CAO Y, MIAO Q G, LIU J C, et al. Advance and prospects of AdaBoost algorithm[J]. Acta Automatica Sinica, 2013, 39(6):745-758.)
[18] MEDDOURI N, KHOUFI H, MADDOURI M S. Diversity analysis on boosting nominal concepts[C]//Proceedings of the 2012 Pacific-Asia Conference on Knowledge Discovery and Data Mining, LNCS 7301. Berlin:Springer, 2012:306-317.
[19] YULE G U. On the association of attributes in statistics:with illustrations from the material of the childhood society, &c[J]. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 1900, 194(252/253/254/255/256/257/258/259/260/261):257-319.
[20] KUNCHEVA L I, WHITAKER C J. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy[J]. Machine Learning, 2003, 51(2):181-207.
[21] SKALAK D B. The sources of increased accuracy for two proposed boosting algorithms[C]//AAAI'96:Proceedings of the Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms. Menlo Park, CA:AAAI Press, 1996:120-125.
[22] GIACINTO G, ROLI F. Design of effective neural network ensembles for image classification purposes[J]. Image and Vision Computing, 2001, 19(9/10):699-707.

AdaBoost的多样性分析及改进

Diversity analysis and improvement of AdaBoost

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	姚光磊, 熊菊霞, 杨国武. 基于神经网络优化的花朵授粉算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2829-2837.
[2]	张俊娜, 王欣新, 李天泽, 赵晓焱, 袁培燕. 基于动态服务缓存辅助的任务卸载方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1493-1500.
[3]	张卓, 陈花竹. 基于一致性和多样性的多尺度自表示学习的深度子空间聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 353-359.
[4]	黄亚伟, 钱雪忠, 宋威. 基于双档案种群大小自适应方法的改进差分进化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3844-3853.
[5]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.
[6]	马勇健, 史旭华, 王佩瑶. 基于两阶段搜索与动态资源分配的约束多目标进化算法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 269-277.
[7]	徐赛娟, 裴镇宇, 林佳炜, 刘耿耿. 基于多阶段搜索的约束多目标进化算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2345-2351.
[8]	杜明, 顾万里, 周军锋, 王志军. 基于motif连通性的社区搜索方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2190-2199.
[9]	赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN：集成学习的新范式在企业自律性评价中的实践[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3107-3113.
[10]	蔡淳豪, 李建良. 小样本问题下培训弱教师网络的模型蒸馏模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2652-2658.
[11]	郭一阳, 于炯, 杜旭升, 杨少智, 曹铭. 基于自编码器与集成学习的离群点检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2078-2087.
[12]	秦晓, 成苗, 张绍兵, 何莲, 石向文, 王品学, 曾尚. 工业场景下基于秩信息对YOLOv4的剪枝[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1417-1423.
[13]	李莉, 石可欣, 任振康. 基于特征选择和TrAdaBoost的跨项目缺陷预测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1554-1562.
[14]	李颖之, 李曼, 董平, 周华春. 基于集成学习的多类型应用层DDoS攻击检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3775-3784.
[15]	李小娟, 韩萌, 王乐, 张妮, 程浩东. 基于准确率爬坡的动态加权集成分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 123-131.