AdaBoost的样本权重与组合系数的分析及改进

doi:10.11772/j.issn.1001-9081.2021050726

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (7): 2022-2029.DOI: 10.11772/j.issn.1001-9081.2021050726

• 人工智能 • 上一篇

AdaBoost的样本权重与组合系数的分析及改进

朱亮, 徐华(), 成金海, 朱深

江南大学人工智能与计算机学院，江苏无锡 214122

收稿日期:2021-05-08 修回日期:2022-02-10 接受日期:2022-02-18 发布日期:2022-03-08 出版日期:2022-07-10
通讯作者: 徐华
作者简介:朱亮（1994—），男，安徽阜阳人，硕士研究生，CCF会员，主要研究方向：机器学习、数据挖掘
成金海（1997—），男，江苏南通人，硕士研究生，主要研究方向：数据挖掘、机器学习、嵌入式软件
朱深（1997—），男，河南周口人，硕士研究生，主要研究方向：数据挖掘、机器学习。

Analysis and improvement of AdaBoost’s sample weight and combination coefficient

Liang ZHU, Hua XU(), Jinhai CHENG, Shen ZHU

School of Artificial Intelligence and Computer Science，Jiangnan University，Wuxi Jiangsu 214122，China

Received:2021-05-08 Revised:2022-02-10 Accepted:2022-02-18 Online:2022-03-08 Published:2022-07-10
Contact: Hua XU
About author:ZHU Liang， born in 1994， M. S. candidate. His research interests include machine learning， data mining.
CHENG Jinhai， born in 1997， M. S. candidate. His research interests include data mining， machine learning， embedded software.
ZHU Shen， born in 1997， M. S. candidate. His research interests include data mining， machine learning.

摘要/Abstract

摘要：

针对自适应增强（AdaBoost）算法的基分类器线性组合效率低以及过度关注难分样本的问题，提出了基于间隔理论的两种改进算法WPIAda与WPIAda.M。首先，WPIAda与WPIAda.M算法都将样本权值的更新分为四种情形，从而增加间隔从正到负变化的样本权值来抑制间隔的负向移动，并减少间隔处于零点的样本数量；其次，WPIAda.M算法根据基分类器的错误率与样本权重的分布状态，给出新的基分类器系数求解方法，从而提高基分类器的组合效率。在10个UCI数据集上，与dfAda、skAda、swaAda等算法相比，WPIAda和WPIAda.M算法的测试误差分别平均降低了7.46个百分点和7.64个百分点；AUC分别提高了11.65个百分点和11.92个百分点。实验结果表明，WPIAda和WPIAda.M算法可以有效降低对难分样本的关注，并且WPIAda.M算法能够更高效地集成基分类器，因此两种算法均可进一步提高分类性能。

关键词: 自适应增强, 间隔理论, 样本权重, 基分类器, 组合效率

Abstract:

Aiming at the problems of low linear combination efficiency and too much attention to hard examples of the base classifiers of Adjusts Adaptive Boosting （AdaBoost） algorithm， two improved algorithms based on margin theory， named sample Weight and Parameterization of Improved AdaBoost （WPIAda） and sample Weight and Parameterization of Improved AdaBoost-Multitude （WPIAda.M）， were proposed. Firstly， the updates of sample weights were divided into four situations by both WPIAda and WPIAda.M algorithms， which increased the sample weights with the margin changing from positive to negative to suppress the negative movement of the margin and reduce the number of samples with the margin at zero. Secondly， according to the error rates of the base classifiers and the distribution of the sample weights， a new method to solve the coefficients of base classifiers was given by WPIAda.M algorithm， thereby improving the combination efficiency of base classifiers. On 10 UCI datasets， compared with algorithms such as WLDF_Ada （dfAda）， skAda， SWA-Adaboost （swaAda）， WPIAda and WPIAda.M algorithms had the test error reduced by 7.46 percentage points and 7.64 percentage points on average respectively， and the Area Under Curve （AUC） increased by 11.65 percentage points and 11.92 percentage points respectively. Experimental results show that WPIAda and WPIAda.M algorithms can effectively reduce the attention to hard examples， and WPIAda.M algorithm can integrate base classifiers more efficiently， so that the two algorithms can both further improve the classification performance.

Key words: Adjusts Adaptive Boosting (AdaBoost), margin theory, sample weight, base classifier, combination efficiency

中图分类号:

TP181

朱亮, 徐华, 成金海, 朱深. AdaBoost的样本权重与组合系数的分析及改进[J]. 计算机应用, 2022, 42(7): 2022-2029.

Liang ZHU, Hua XU, Jinhai CHENG, Shen ZHU. Analysis and improvement of AdaBoost’s sample weight and combination coefficient[J]. Journal of Computer Applications, 2022, 42(7): 2022-2029.

图/表 6

参考文献 30

1	SCHAPIRE R E. The strength of weak learnability［J］. Machine Learning， 1990， 5（2）：197-227. 10.1007/bf00116037
2	REYZIN L， SCHAPIRE R E. How boosting the margin can also boost classifier complexity［C］// Proceedings of the 23rd International Conference on Machine Learning. New York： ACM， 2006：753-760. 10.1145/1143844.1143939
3	FREUND Y， SCHAPIRE R E. A decision-the retic generation of online learning and an application to boosting［J］. Journal of Computer and System Science， 1997， 55（1）：119-139. 10.1006/jcss.1997.1504
4	BIAU G， CADRE B， ROUVIÈRE L. Accelerated gradient boosting［J］. Machine Learning， 2019， 108（6）：971-992. 10.1007/s10994-019-05787-1
5	ZHANG C S， BI J J， XU S X， et al. Multi-Imbalance： an open-source software for multi-class imbalance learning ［J］. Knowledge-Based Systems， 2019， 174：137-143. 10.1016/j.knosys.2019.03.001
6	FRIEDMAN J， HASTIE T， TIBSHIRANI R. Additive logistic regression： a statistical view of boosting［J］. The Annals of Statistics， 2000， 28（2）：337-407. 10.1214/aos/1016218223
7	SCHAPIRE R E， FREUND Y， BARTLETT P， et al. Boosting the margin： a new explanation for the effectiveness of voting methods［J］. The Annals of Statistics， 1998， 26（5）：1651-1686. 10.1214/aos/1024691352
8	SHEN C H， LI H X. On the dual formulation of boosting algorithms［J］. IEEE Transaction on Pattern Analysis and Machine Intelligence， 2010， 32（12）：2216-2231. 10.1109/tpami.2010.47
9	AL-SHEMARRY M S， LI Y， ABDULLA S. Ensemble of AdaBoost cascades of 3L-LBPs classifiers for license plates detection with low quality images［J］. Expert Systems with Applications， 2018， 92：216-235. 10.1016/j.eswa.2017.09.036
10	GOSZTOLYA G， BUSA-FEKETE R. Calibrating AdaBoost for phoneme classification［J］. Soft Computing， 2019， 23（1）：115-128. 10.1007/s00500-018-3577-z
11	PINTO T， PRAÇA I， VALE Z， et al. Ensemble learning for electricity consumption forecasting in office buildings［J］. Neurocomputing， 2021， 423：747-755. 10.1016/j.neucom.2020.02.124
12	GUTIÉRREZ-TOBAL G C， ÁLVAREZ D， DEL CAMPO F， et al. Utility of AdaBoost to detect sleep apnea-hypopnea syndrome from single-channel airflow［J］. IEEE Transactions on Biomedical Engineering， 2016， 63（3）：636-646. 10.1109/tbme.2015.2467188
13	ZHANG Y， LI D P， WANG Y J. An indoor passive positioning method using CSI fingerprint based on AdaBoost［J］. IEEE Sensors Journal， 2019， 19（14）：5792-5800. 10.1109/jsen.2019.2907109
14	MA Z， LIU Y， LIU X M， et al. Lightweight privacy preserving ensemble classification for face recognition［J］. IEEE Internet of Things Journal， 2019， 6（3）： 5778-5790. 10.1109/jiot.2019.2905555
15	AL-SALEMI B， AYOB M， NOAH S A M. Feature ranking for enhancing boosting-based multi-label text categorization［J］. Expert Systems with Applications， 2018， 113：531-543. 10.1016/j.eswa.2018.07.024
16	VALLE C， ÑANCULEF R， ALLENDE H， et al. LocalBoost： a parallelizable approach to boosting classifiers［J］. Neural Processing Letters， 2019， 50（1）：19-41. 10.1007/s11063-018-9924-3
17	QI Z Q， MENG F， TIAN Y J， et al. AdaBoost-LLP： a boosting method for learning with label proportions［J］. IEEE Transactions on Neural Networks and Learning Systems， 2018， 29（8）：3548-3559. 10.1109/tnnls.2017.2727065
18	高敬阳，赵彦. 基于样本抽样和权重调整的SWA-Adaboost算法［J］. 计算机工程， 2014， 40（9）：248-251， 256. 10.3969/j.issn.1000-3428.2014.09.050
	GAO J Y， ZHAO Y. SWA-Adaboost algorithm based on sampling and weight adjustment［J］. Computer Engineering， 2014， 40（9）：248-251， 256. 10.3969/j.issn.1000-3428.2014.09.050
19	YANG Y， JIANG J M. Hybrid sampling-based clustering ensemble with global and local constitutions［J］. IEEE Transactions on Neural Networks and Learning Systems， 2016， 27（5）：952-965. 10.1109/tnnls.2015.2430821
20	FILISBINO T A， GIRALDI G A， THOMAZ C E. Nested AdaBoost procedure for classification and multi-class nonlinear discriminant analysis［J］. Soft Computing， 2020， 24（23）：17969-17990. 10.1007/s00500-020-05045-w
21	HTIKE K K. Efficient determination of the number of weak learners in AdaBoost［J］. Journal of Experimental and Theoretical Artificial Intelligence， 2017， 29（5）：967-982. 10.1080/0952813x.2016.1266038
22	吴恋，马敏耀，黄一峰，等. 基于AdaBoost算法的Linux病毒检测研究［J］. 计算机工程， 2018， 44（8）：161-166.
	WU L， MA M Y， HUANG Y F， et al. Linux virus detection study based on AdaBoost algorithm［J］. Computer Engineering， 2018， 44（8）：161-166.
23	ZHANG P B， YANG Z X. A novel AdaBoost framework with robust threshold and structural optimization［J］. IEEE Transactions on Cybernetics， 2018， 48（1）：64-76. 10.1109/tcyb.2016.2623900
24	邱仁博，娄震. 一种改进的带参数AdaBoost算法［J］. 计算机工程， 2016， 42（7）：199-202， 208. 10.3969/j.issn.1000-3428.2016.07.033
	QIU R B， LOU Z. An improved parameterized AdaBoost algorithm［J］. Computer Engineering， 2016， 42（7）：199-202， 208. 10.3969/j.issn.1000-3428.2016.07.033
25	WU S Q， NAGAHASHI H. Parameterized AdaBoost： introducing a parameter to speed up the training of real AdaBoost［J］. IEEE Signal Processing Letters， 2014， 21（6）：687-691. 10.1109/lsp.2014.2313570
26	XING H J， LIU W T. Robust AdaBoost based ensemble of one-class support vector machines［J］. Information Fusion， 2020， 55：45-58. 10.1016/j.inffus.2019.08.002
27	王玲娣，徐华. AdaBoost的多样性分析及改进［J］. 计算机应用， 2018， 38（3）：650-654， 660. 10.11772/j.issn.1001-9081.2017092226
	WANG L D， XU H. Diversity analysis and improvement of AdaBoost［J］. Journal of Computer Applications， 2018， 38（3）： 650-654， 660. 10.11772/j.issn.1001-9081.2017092226
28	SHEN C H， LI H X. Boosting through optimization of margin distributions［J］. IEEE Transactions on Neural Networks， 2010， 21（4）： 659-666. 10.1109/tnn.2010.2040484
29	朱亮，徐华，崔鑫. 基于基分类器系数和多样性的改进AdaBoost算法［J］. 计算机应用， 2021， 41（8）：2225-2231.
	ZHU L， XU H， CUI X. Improved AdaBoost algorithm based on classifier coefficient and diversity［J］. Journal of Computer Applications， 2021， 41（8）：2225-2231.
30	FAWCETT T. An introduction to ROC analysis［J］. Pattern Recognition Letters， 2006， 27（8）：861-874. 10.1016/j.patrec.2005.10.010

数据集	实例数	维数	数据集	实例数	维数
australian	690	14	heart	270	14
balance	625	4	ilpd	583	10
breast	683	9	messidor	1 151	19
chess	3 196	36	pima	768	9
german	1 000	24	sonar	208	61

数据集	实例数	维数	数据集	实例数	维数
australian	690	14	heart	270	14
balance	625	4	ilpd	583	10
breast	683	9	messidor	1 151	19
chess	3 196	36	pima	768	9
german	1 000	24	sonar	208	61

数据集	算法
数据集	dfAda	IPAB	skAda	adAda	swaAda	PAB	WPIAda	WPIAda.M
australian	13.77	67.68	14.49	14.64	27.25	33.77	13.62	14.49
balance	0.35	26.27	0.18	0.86	6.06	0.88	0.35	0.00
breast	4.10	23.80	3.95	5.12	4.54	4.98	3.66	3.80
chess	5.04	40.43	5.23	13.78	12.00	10.35	3.22	4.63
german	25.70	63.70	25.50	28.90	26.80	34.00	24.10	24.60
heart	16.30	33.70	18.52	18.89	21.85	24.44	18.89	16.30
ilpd	27.58	51.50	30.36	49.40	34.31	42.00	28.62	29.83
messidor	33.20	56.99	34.58	56.30	37.44	42.74	31.63	31.97
pima	23.08	61.93	23.62	26.09	29.49	29.78	23.87	22.42
sonar	16.64	35.00	17.21	19.64	17.64	27.57	15.29	13.14

数据集	算法
数据集	dfAda	IPAB	skAda	adAda	swaAda	PAB	WPIAda	WPIAda.M
australian	13.77	67.68	14.49	14.64	27.25	33.77	13.62	14.49
balance	0.35	26.27	0.18	0.86	6.06	0.88	0.35	0.00
breast	4.10	23.80	3.95	5.12	4.54	4.98	3.66	3.80
chess	5.04	40.43	5.23	13.78	12.00	10.35	3.22	4.63
german	25.70	63.70	25.50	28.90	26.80	34.00	24.10	24.60
heart	16.30	33.70	18.52	18.89	21.85	24.44	18.89	16.30
ilpd	27.58	51.50	30.36	49.40	34.31	42.00	28.62	29.83
messidor	33.20	56.99	34.58	56.30	37.44	42.74	31.63	31.97
pima	23.08	61.93	23.62	26.09	29.49	29.78	23.87	22.42
sonar	16.64	35.00	17.21	19.64	17.64	27.57	15.29	13.14

数据集	算法
数据集	dfAda	IPAB	skAda	adAda	swaAda	PAB	WPIAda	WPIAda.M
australian	92.47	11.01	85.32	90.35	82.24	78.91	92.34	92.22
balance	99.98	32.33	99.83	99.97	99.13	99.88	100.00	100.00
breast	99.38	17.06	95.25	98.88	99.01	98.60	99.17	99.42
chess	99.13	46.56	94.69	96.47	96.05	95.01	99.06	99.26
german	77.13	29.03	66.57	68.02	76.13	70.16	78.92	78.41
heart	89.20	21.30	81.65	87.21	86.34	85.32	90.00	90.07
ilpd	72.00	47.15	58.09	63.85	64.70	66.59	70.66	71.37
messidor	72.63	38.55	65.46	60.06	69.31	63.15	75.84	75.28
pima	82.85	25.63	71.74	77.00	76.61	76.95	83.48	83.46
sonar	92.22	33.51	81.78	88.77	90.65	81.25	90.74	93.96

AdaBoost的样本权重与组合系数的分析及改进

Analysis and improvement of AdaBoost’s sample weight and combination coefficient

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 30

相关文章 8

编辑推荐

Metrics

[1]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.
[2]	苏俊宁, 叶东毅. 基于样本密度峰值的不平衡数据欠抽样方法[J]. 计算机应用, 2020, 40(1): 83-89.
[3]	张宗堂, 陈喆, 戴卫国. 基于间隔理论的过采样集成算法[J]. 计算机应用, 2019, 39(5): 1364-1367.
[4]	莫赞, 盖彦蓉, 樊冠龙. 基于GAN-AdaBoost-DT不平衡分类算法的信用卡欺诈分类[J]. 计算机应用, 2019, 39(2): 618-622.
[5]	许业旺, 王永利, 赵忠文. 基于实例的强分类器快速集成方法[J]. 计算机应用, 2017, 37(4): 1100-1104.
[6]	王忠民, 王科, 贺炎. 高可信度加权的多分类器融合行为识别模型[J]. 计算机应用, 2016, 36(12): 3353-3357.
[7]	姬波叶阳东卢红星. 基于样本权重的出租车聚集区识别算法[J]. 计算机应用, 2013, 33(05): 1338-1342.
[8]	蔡铁伍星李烨. 集成学习中基于离散化方法的基分类器构造研究[J]. 计算机应用, 2008, 28(8): 2091-2093.