计算机应用 ›› 2018, Vol. 38 ›› Issue (6): 1591-1595.DOI: 10.11772/j.issn.1001-9081.2017122900

• 数据科学与技术 • 上一篇    下一篇

新型含噪数据流集成分类的算法

袁泉1,2, 郭江帆1   

  1. 1. 重庆邮电大学 通信新技术应用研究中心, 重庆 400065;
    2. 重庆信科设计有限公司, 重庆 401121
  • 收稿日期:2017-12-12 修回日期:2018-02-11 出版日期:2018-06-10 发布日期:2018-06-13
  • 通讯作者: 郭江帆
  • 作者简介:袁泉(1976-),男,湖南绥宁人,高级工程师,硕士,主要研究方向:数字图像处理、通信新技术;郭江帆(1991-),男,河南漯河人,硕士研究生,主要研究方向:数据挖掘。

New ensemble classification algorithm for data stream with noise

YUAN Quan1,2, GUO Jiangfan1   

  1. 1. Research Center of New Telecommunication Technology Applications, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;
    2. Chongqing Information Technology Designing Company Limited, Chongqing 401121, China
  • Received:2017-12-12 Revised:2018-02-11 Online:2018-06-10 Published:2018-06-13

摘要: 针对数据流中概念漂移和噪声问题,提出一种新型的增量式学习的数据流集成分类算法。首先,引入噪声过滤机制过滤噪声;然后,引入假设检验方法对概念漂移进行检测,以增量式C4.5决策树为基分类器构建加权集成模型;最后,实现增量式学习实例并随之动态更新分类模型。实验结果表明,该集成分类器对概念漂移的检测精度达到95%~97%,对数据流抗噪性保持在90%以上。该算法分类精度较高,且在检测概念漂移的准确性和抗噪性方面有较好的表现。

关键词: 数据流, 噪声, 概念漂移, 分类算法, 分类精度

Abstract: Concerning the problem of concept drift and noise in data stream, a new kind of incremental learning data stream ensemble classification algorithm was proposed. Firstly, a noise filtering mechanism was introduced to filter the noise. Then, a hypothesis testing method was introduced to detect the concept drift, and an incremental C4.5 decision tree was used as the base classifier to construct the weighted ensemble model. Finally, the incremental learning examples were realized, and the classification model was updated dynamically. The experimental results show that, the detection accuracy of the proposed ensemble classifier for concept drift reaches 95%-97%, and its noise immunity in data steam stays above 90%. The proposed algorithm has higher classification accuracy and better performance in the accuracy of detecting concept drift and noise immunity.

Key words: data stream, noise, concept drift, classification algorithm, classification accuracy

中图分类号: