面向高速数据流的集成分类器算法

doi:10.3724/SP.J.1087.2012.00629

计算机应用 ›› 2012, Vol. 32 ›› Issue (03): 629-633.DOI: 10.3724/SP.J.1087.2012.00629

面向高速数据流的集成分类器算法

李南^1,2,郭躬德^1,2

1.福建师范大学数学与计算机科学学院, 福州 350007;
2.福建师范大学网络安全与密码技术重点实验室,福州 350007

收稿日期:2011-08-30 修回日期:2011-11-20 发布日期:2012-03-01 出版日期:2012-03-01
通讯作者: 郭躬德
作者简介:李南(1987-),男,福建福州人,硕士研究生,主要研究方向:信息融合、数据流挖掘;郭躬德(1965-),男,福建龙岩人,教授,博士,主要研究方向:数据挖掘、机器学习。
基金资助:
国家自然科学基金资助项目(61070062,61175123);福建高校产学合作科技重大项目(2010H6007)。

Ensemble classification algorithm for high speed data stream

LI Nan^1,2, GUO Gong-de^1,2

1.School of Mathematics and Computer Science, Fujian Normal University,Fuzhou Fujian 350007,China;
2.Key Laboratory of Network Security and Cryptography, Fujian Normal University,Fuzhou Fujian 350007,China

Received:2011-08-30 Revised:2011-11-20 Online:2012-03-01 Published:2012-03-01
Contact: Gong-de GUO

摘要/Abstract

摘要： 数据流挖掘要求算法在占用少量内存空间的前提下快速地处理数据并且自适应概念漂移,据此提出一种面向高速数据流的集成分类器算法。该算法将原始数据流沿着时间轴划分为若干数据块后,在各个数据块上计算所有类别的中心点和对应的子空间;此后将各个数据块上每个类别的中心点和对应的子空间集成作为分类模型,并利用统计理论的相关知识检测概念漂移,动态地调整模型。实验结果表明,该方法能够在自适应数据流概念漂移的前提下对数据流进行快速的分类,并得到较好的分类效果。

关键词: 概念漂移, 数据流, 子空间, 分类, 集成

Abstract: The algorithms for mining data streams have to make fast response and adapt to the concept drift at the premise of light demands on memory resources. This paper proposed an ensemble classification algorithm for high speed data stream. After dividing a given data stream into several data blocks, it computed the central point and subspace for every class on each block which were integrated as the classification model. Meanwhile, it made use of statistics to detect concept drift. The experimental results show that the proposed method not only classifies the data stream fast and adapt to the concept drift with higher speed, but also has a better classification performance.

Key words: concept drift, data stream, subspace, classification, integration

中图分类号:

TP18
TP311

李南郭躬德. 面向高速数据流的集成分类器算法[J]. 计算机应用, 2012, 32(03): 629-633.

LI Nan GUO Gong-de. Ensemble classification algorithm for high speed data stream[J]. Journal of Computer Applications, 2012, 32(03): 629-633.

参考文献

[1]李燕,张玉红,胡学钢. 基于C4.5和NB混合模型的数据流分类算法[J].计算机科学,2010,37(12):138-142.

[2]WIDMER G, KUBAT M. Learning in the presence of concept drift and hidden contexts [J]. Machine Learning,1996,23(1):69-101.

[3]王黎明,周驰.自适应概念漂移的在线集成分类器[J].计算机工程,2011,37(5):74-76.

[4]TSYMBAL A, PECHENIZKIY M, CUNNINGHAM P, et al. Dynamic integration of classifiers for handling concept drift [J]. Information Fusion, 2008,9(1):56-68.

[5]HANSEN L K, SALAMON P. Neutral network ensemble [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(10):993-1001.

[6]WANG H, FAN W, YU P. et al. Mining concept drifting data streams using ensemble classifiers [C]// KDD'03: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2003:226-235.

[7]STREET W, KIM Y. A Streaming Ensemble Algorithm (SEA) for large-scale classification [C]// KDD'01: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2001:77-382.

[8]胡学刚,潘春香.基于实例加权方法的概念漂移问题研究[J].计算机工程与应用, 2008, 44(21):188-190.

[9]欧阳震诤,罗建书,胡东敏,等.一种不平衡数据流集成分类模型[J].电子学报,2010,38(1):184-189.

[10]张健沛,杨显飞,杨静.面向高速数据流的偏倚抽样集合分类器[J].北京邮电大学学报,2010,33(4):44-48.

[11]JEREMY Z K, MARCUS A M. Dynamic weighted majority: An ensemble method for drifting concepts [J]. Journal of Machine Research, 2007,8(12):2755-2790.

[12]文益民,王耀南,张莹.基于可信多数投票的快速概念漂移检测[J].湖南大学学报,2010,37(6):36-40.

[13]关菁华,刘大有.一种挖掘概念漂移数据流的选择性集成算法[J].计算机科学,2010,37(1):204-207.

[14]COVER T M, HART P E. Nearest neighbor pattern classification [J]. IEEE Transactions on Information Theory,1967,13(1):21-27.

[15]YANG Q, WU X. 10 Challenging problems in data mining research [J]. Journary of Information Technology and Decision Making,2006,5(4):597-604.

[16]鲁婷,王浩,姚宏亮.一种基于中心文档的KNN中文文本分类算法[J].计算机工程与应用,2011,47(2):127-130.

[17]AGGARWAL C C, PROCOPIUC C, WOLF J L, et al. Fast algorithm for projected clustering [C]// SIGMOD'99: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. New York: ACM Press,1999: 61-71.

[18]MOISE G, SANDER J, ESTER M. Robust projected clustering [J]. Knowledge Information System, 2008, 14(3)273-398.

[19]HUANG J Z, NG M K, RONG H, et al. Automated variable weighting in k-means type clustering [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5):657-668.

[20]盛骤,谢式千,潘承毅.概率论与数理统计[M].北京:高等教育出版社,2006:241-243.

[21]HULTEN G, SPENCER L, DOMINGOS P. Mining time-changing data streams [C]// KDD'01: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2001:97-106.

面向高速数据流的集成分类器算法

Ensemble classification algorithm for high speed data stream

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	黄于欣, 徐佳龙, 余正涛, 侯书楷, 周家啟. 基于生成提示的无监督文本情感转换方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2667-2673.
[2]	孙淳, 胡春龙, 黄树成. 一致性保留的集成排序年龄估计方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2381-2386.
[3]	冷强奎, 孙薛梓, 孟祥福. 基于样本势和噪声进化的不平衡数据过采样方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2466-2475.
[4]	张全梅, 黄润萍, 滕飞, 张海波, 周南. 融合异构信息的自动国际疾病分类编码方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2476-2482.
[5]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[6]	王清, 赵杰煜, 叶绪伦, 王弄潇. 统一框架的增强深度子空间聚类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1995-2003.
[7]	葛焌迟, 赵为华. 矩阵数据基于鲁棒主成分分析的距离加权判别分析[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2073-2079.
[8]	陆潜慧, 张羽, 王梦灵, 吴庭伟, 单玉忠. 基于改进循环池化网络的核电装备质量文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2034-2040.
[9]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[10]	袁子璇, 翁小清, 戈宁振. 基于正交局部保持映射和成本优化的多变量时间序列早期分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1832-1841.
[11]	李旭, 何玉林, 崔来中, 黄哲学, PHILIPPE Fournier‑Viger. 基于大数据随机样本划分的分布式观测点分类器[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1727-1733.
[12]	翟飞宇, 马汉达. 基于DenseNet的经典-量子混合分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1905-1910.
[13]	黎施彬, 龚俊, 汤圣君. 基于Graph Transformer的半监督异配图表示学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1816-1823.
[14]	余新言, 曾诚, 王乾, 何鹏, 丁晓玉. 基于知识增强和提示学习的小样本新闻主题分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1767-1774.
[15]	高文烁, 陈晓云. 基于节点结构的点云分类网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1471-1478.