基于K近邻统计的非线性AdaBoost算法

doi:10.11772/j.issn.1001-9081.2015.09.2579

计算机应用 ›› 2015, Vol. 35 ›› Issue (9): 2579-2583.DOI: 10.11772/j.issn.1001-9081.2015.09.2579

基于K近邻统计的非线性AdaBoost算法

苟富, 郑凯

华东师范大学计算中心, 上海 200062

收稿日期:2015-04-20 修回日期:2015-05-26 出版日期:2015-09-10 发布日期:2015-09-17
通讯作者: 郑凯(1968-),男,浙江宁波人,副教授,博士,主要研究方向:计算机网络、云计算,kzheng@cs.ecnu.edu.cn
作者简介:苟富(1989-),男,山西大同人,硕士研究生,主要研究方向:数据挖掘、机器学习
基金资助:
国家863计划项目(2013AA01A211)。

Nonlinear AdaBoost algorithm based on statistics for K-nearest neighbors

GOU Fu, ZHENG Kai

Computer Center, East China Normal University, Shanghai 200062, China

Received:2015-04-20 Revised:2015-05-26 Online:2015-09-10 Published:2015-09-17

摘要/Abstract

摘要： AdaBoost是数据挖掘领域最常见的提升算法之一。对传统AdaBoost将各个基分类器线性相加所存在的不足进行分析,并针对AdaBoost各个弱分类器的加权方式提出新的改进,将传统的线性相加改为非线性组合,把从学习过程得到的固定不变的权重系数改为由预测阶段的具体实例决定的动态参数,该参数基于待测实例K近邻的分类结果统计,从而使各个基分类器的权重更贴近当前待测实例的实际可靠度。实验结果表明,与传统AdaBoost相比,提出的非线性改进算法对不同数据集均有不同程度提升,提升最高的达到了7个百分点。由此证明,提出的改进是一种更加准确的分类算法,对绝大多数数据集均能得到更高的分类准确率。

关键词: AdaBoost, 数据挖掘, 分类器, 非线性, K近邻

Abstract: AdaBoost is one of the most popular boosting algorithms in the area of data mining. By analyzing the disadvantages of the traditional AdaBoost using linear combination of the basic classifiers, a new algorithm was proposed, which changed the traditional linear addition into a nonlinear combination, and replaced the constant weights acquired in the training stage by a series of dynamic parameters based on the statistics of the K-nearest neighbors and decided by the instances in the predicting stage. In this way, the weight of each basic classifier was closer to reality. The experimental results show that, compared to the traditional AdaBoost, the new algorithm can increase the prediction accuracy nearly seven percentage points at most. The new algorithm is more accurate and it can achieve higher classification accuracy for most data sets.

Key words: AdaBoost, data mining, classifier, nonlinear, K-nearest neighbor

中图分类号:

苟富, 郑凯. 基于K近邻统计的非线性AdaBoost算法[J]. 计算机应用, 2015, 35(9): 2579-2583.

GOU Fu, ZHENG Kai. Nonlinear AdaBoost algorithm based on statistics for K-nearest neighbors[J]. Journal of Computer Applications, 2015, 35(9): 2579-2583.

参考文献

[1] HAN J W, KAMBER M. Data mining: concepts and techniques [M]. FAN M, MENG X, translated. Beijing: China Machine Press, 2012:211-249.(HAN J W,KAMBER M.数据挖掘:概念与技术[M].范明,孟小峰,译.北京:机械工业出版社,2012:211-249.)
[2] LEI L, WANG X. Improved AdaBoost ensemble approach based on loss function [J]. Journal of Computer Applications, 2012,32(10):2916-2919.(雷蕾,王晓丹.基于损失函数的AdaBoost改进算法[J].计算机应用,2012,32(10):2916-2919.)
[3] MENG Z, JIANG H, CHEN J, et al. Feature pruning based AdaBoost and its application in face detection [J]. Journal of Zhejiang University: Engineering Science, 2013,47(5):906-911.(孟子博,姜虹,陈婧,等.基于特征裁剪的AdaBoost算法及在人脸检测中的应用[J].浙江大学学报:工学版,2013,47(5):906-911.)
[4] GE J, LU D, FANG Y. A revised training mechanism for AdaBoost algorithm [C]//Proceedings 2010 IEEE International Conference on Software Engineering and Service Sciences. Piscataway: IEEE, 2010:491-494.
[5] LIU X. The improvement of the weighting method in AdaBoost [D]. Beijing: Beijing Jiaotong University, 2010.(刘雪莲.AdaBoost中加权方式的改进[D].北京:北京交通大学,2010.)
[6] CAO Y, MIAO Q, LIU J, et al. Advance and prospects of AdaBoost algorithm [J]. Acta Automatica Sinica, 2013,39(6):745-758.(曹莹,苗启广,刘家辰,等.AdaBoost算法研究进展与展望[J].自动化学报,2013,39(6):745-758.)
[7] FAN Y. Research on face detection based on AdaBoost algorithm [D]. Hangzhou: Zhejiang University of Technology, 2008.(范一峰.基于AdaBoost算法的人脸检测研究[D].杭州:浙江工业大学,2008.)
[8] BLAKE C, KEOGH E, MERZ C. UCI machine learning repository [EB/OL]. [2014-10-24]. http://www.ics.uci.edu/~mlearn/Mlrepository.html.
[9] WITTEN I H, FRANK E. Data mining: practical machine learning tools and techniques with Java implementations [M]. Beijing: China Machine Press, 2003:265-296.(WITTEN I H, FRANK E.数据挖掘:实用机器学习技术及Java实现[M].北京:机械工业出版社,2003:265-296.)
[10] WANG W, NIU H. Face detection based on improved AdaBoost algorithm in e-learning [C]//Proceedings of 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems. Piscataway: IEEE, 2012,2:924-927.
[11] GUO Q, LI L, LI N. Novel modified AdaBoost algorithm for imbalanced data classification [J]. Computer Engineering and Applications, 2008,44(21):217-221.(郭乔进,李立斌,李宁.一种用于不平衡数据分类的改进AdaBoost算法[J].计算机工程与应用,2008,44(21):217-221.)
[12] LI R, LI C. Pruning AdaBoost algorithm based on covariance feature [J]. Application Research of Computers, 2014,31(11): 3517-3520.(李睿,李长风.基于协方差特征的裁剪AdaBoost算法[J].计算机应用研究,2014,31(11):3517-3520.)

[1]	汤安迪, 韩统, 徐登武, 谢磊. 混沌精英哈里斯鹰优化算法[J]. 计算机应用, 2021, 41(8): 2265-2272.
[2]	朱亮, 徐华, 崔鑫. 基于基分类器系数和多样性的改进AdaBoost算法[J]. 计算机应用, 2021, 41(8): 2225-2231.
[3]	王怀, 王展青. 非线性约束下的准单应变换图像拼接算法[J]. 计算机应用, 2021, 41(8): 2318-2323.
[4]	张豪, 朱睿, 宋栿尧, 方鹏, 夏秀峰. 距离-关键字相似度约束的双色反k近邻查询方法[J]. 计算机应用, 2021, 41(6): 1686-1693.
[5]	刘世泽, 秦艳君, 王晨星, 苏琳, 柯其学, 罗海勇, 孙艺, 王宝会. 基于深度残差长短记忆网络交通流量预测算法[J]. 计算机应用, 2021, 41(6): 1566-1572.
[6]	李旭娟, 皮建勇, 黄飞翔, 贾海朋. 基于自生成深度神经网络的4D航迹预测[J]. 计算机应用, 2021, 41(5): 1492-1499.
[7]	严爱军, 魏志远. 案例推理分类器的权重分配及案例库维护方法[J]. 计算机应用, 2021, 41(4): 1071-1077.
[8]	闫钧华, 侯平, 张寅, 吕向阳, 马越, 王高飞. 基于多尺度多分类器卷积神经网络的混合失真类型判定方法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3178-3184.
[9]	王俊红, 闫家荣. 基于欠采样和代价敏感的不平衡数据分类算法[J]. 计算机应用, 2021, 41(1): 48-52.
[10]	陈凯, 于彦伟, 赵金东, 宋鹏. 基于城市交通监控大数据的工作位置推理方法[J]. 计算机应用, 2021, 41(1): 177-184.
[11]	尹春勇, 朱宇航. 基于垂直集成Tri-training的虚假评论检测模型[J]. 计算机应用, 2020, 40(8): 2194-2201.
[12]	肖跃雷, 张云娇. 基于特征选择和超参数优化的恐怖袭击组织预测方法[J]. 计算机应用, 2020, 40(8): 2262-2267.
[13]	冯子凯, 陈立家, 刘名果, 袁蒙恩. 基于结构自适应滤波方法的非线性系统辨识[J]. 计算机应用, 2020, 40(8): 2319-2326.
[14]	龙洋洋, 陈玉玲, 辛阳, 豆慧. 基于联盟区块链的安全能源交易方案[J]. 计算机应用, 2020, 40(6): 1668-1673.
[15]	杜旭升, 于炯, 叶乐乐, 陈嘉颖. 基于图上随机游走的离群点检测算法[J]. 计算机应用, 2020, 40(5): 1322-1328.

基于K近邻统计的非线性AdaBoost算法

Nonlinear AdaBoost algorithm based on statistics for K-nearest neighbors

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics