计算机应用 ›› 2015, Vol. 35 ›› Issue (9): 2579-2583.DOI: 10.11772/j.issn.1001-9081.2015.09.2579

• 人工智能 • 上一篇    下一篇

基于K近邻统计的非线性AdaBoost算法

苟富, 郑凯   

  1. 华东师范大学 计算中心, 上海 200062
  • 收稿日期:2015-04-20 修回日期:2015-05-26 出版日期:2015-09-10 发布日期:2015-09-17
  • 通讯作者: 郑凯(1968-),男,浙江宁波人,副教授,博士,主要研究方向:计算机网络、云计算,kzheng@cs.ecnu.edu.cn
  • 作者简介:苟富(1989-),男,山西大同人,硕士研究生,主要研究方向:数据挖掘、机器学习
  • 基金资助:
    国家863计划项目(2013AA01A211)。

Nonlinear AdaBoost algorithm based on statistics for K-nearest neighbors

GOU Fu, ZHENG Kai   

  1. Computer Center, East China Normal University, Shanghai 200062, China
  • Received:2015-04-20 Revised:2015-05-26 Online:2015-09-10 Published:2015-09-17

摘要: AdaBoost是数据挖掘领域最常见的提升算法之一。对传统AdaBoost将各个基分类器线性相加所存在的不足进行分析,并针对AdaBoost各个弱分类器的加权方式提出新的改进,将传统的线性相加改为非线性组合,把从学习过程得到的固定不变的权重系数改为由预测阶段的具体实例决定的动态参数,该参数基于待测实例K近邻的分类结果统计,从而使各个基分类器的权重更贴近当前待测实例的实际可靠度。实验结果表明,与传统AdaBoost相比,提出的非线性改进算法对不同数据集均有不同程度提升,提升最高的达到了7个百分点。由此证明,提出的改进是一种更加准确的分类算法,对绝大多数数据集均能得到更高的分类准确率。

关键词: AdaBoost, 数据挖掘, 分类器, 非线性, K近邻

Abstract: AdaBoost is one of the most popular boosting algorithms in the area of data mining. By analyzing the disadvantages of the traditional AdaBoost using linear combination of the basic classifiers, a new algorithm was proposed, which changed the traditional linear addition into a nonlinear combination, and replaced the constant weights acquired in the training stage by a series of dynamic parameters based on the statistics of the K-nearest neighbors and decided by the instances in the predicting stage. In this way, the weight of each basic classifier was closer to reality. The experimental results show that, compared to the traditional AdaBoost, the new algorithm can increase the prediction accuracy nearly seven percentage points at most. The new algorithm is more accurate and it can achieve higher classification accuracy for most data sets.

Key words: AdaBoost, data mining, classifier, nonlinear, K-nearest neighbor

中图分类号: