计算机应用 ›› 2012, Vol. 32 ›› Issue (03): 629-633.DOI: 10.3724/SP.J.1087.2012.00629

• 人工智能 • 上一篇    下一篇

面向高速数据流的集成分类器算法

李南1,2,郭躬德1,2   

  1. 1.福建师范大学 数学与计算机科学学院, 福州 350007;
    2.福建师范大学 网络安全与密码技术重点实验室,福州 350007
  • 收稿日期:2011-08-30 修回日期:2011-11-20 发布日期:2012-03-01 出版日期:2012-03-01
  • 通讯作者: 郭躬德
  • 作者简介:李南(1987-),男,福建福州人,硕士研究生,主要研究方向:信息融合、数据流挖掘;郭躬德(1965-),男,福建龙岩人,教授,博士,主要研究方向:数据挖掘、机器学习。
  • 基金资助:

    国家自然科学基金资助项目(61070062,61175123);福建高校产学合作科技重大项目(2010H6007)。

Ensemble classification algorithm for high speed data stream

LI Nan1,2, GUO Gong-de1,2   

  1. 1.School of Mathematics and Computer Science, Fujian Normal University,Fuzhou Fujian 350007,China;
    2.Key Laboratory of Network Security and Cryptography, Fujian Normal University,Fuzhou Fujian 350007,China
  • Received:2011-08-30 Revised:2011-11-20 Online:2012-03-01 Published:2012-03-01
  • Contact: Gong-de GUO

摘要: 数据流挖掘要求算法在占用少量内存空间的前提下快速地处理数据并且自适应概念漂移,据此提出一种面向高速数据流的集成分类器算法。该算法将原始数据流沿着时间轴划分为若干数据块后,在各个数据块上计算所有类别的中心点和对应的子空间;此后将各个数据块上每个类别的中心点和对应的子空间集成作为分类模型,并利用统计理论的相关知识检测概念漂移,动态地调整模型。实验结果表明,该方法能够在自适应数据流概念漂移的前提下对数据流进行快速的分类,并得到较好的分类效果。

关键词: 概念漂移, 数据流, 子空间, 分类, 集成

Abstract: The algorithms for mining data streams have to make fast response and adapt to the concept drift at the premise of light demands on memory resources. This paper proposed an ensemble classification algorithm for high speed data stream. After dividing a given data stream into several data blocks, it computed the central point and subspace for every class on each block which were integrated as the classification model. Meanwhile, it made use of statistics to detect concept drift. The experimental results show that the proposed method not only classifies the data stream fast and adapt to the concept drift with higher speed, but also has a better classification performance.

Key words: concept drift, data stream, subspace, classification, integration

中图分类号: