计算机应用 ›› 2011, Vol. 31 ›› Issue (05): 1363-1366.DOI: 10.3724/SP.J.1087.2011.01363

• 数据库技术 • 上一篇    下一篇

时间滑动窗口内基于密度的数据流聚类算法

李娜,邢长征   

  1. 辽宁工程技术大学 电子与信息工程学院,辽宁 葫芦岛 125105
  • 收稿日期:2010-10-20 修回日期:2010-12-13 发布日期:2011-05-01 出版日期:2011-05-01
  • 通讯作者: 李娜
  • 作者简介:李娜(1986-),女,山东聊城人,硕士研究生,主要研究方向:数据挖掘、数据流聚类;邢长征(1967-),男,辽宁阜新人,教授,博士,主要研究方向:数据库、数据挖掘、数据流聚类。

Density-based data stream clustering algorithm over time-based sliding windows

LI Na, XING Chang-zheng   

  1. College of Electronics and Information Engineering,Liaoning Technical University, Huludao Liaoning 125105, China
  • Received:2010-10-20 Revised:2010-12-13 Online:2011-05-01 Published:2011-05-01

摘要: 为了提高数据流的聚类质量和效率,采用等时间跨度滑动窗口技术,然后利用改进的微簇结构保存数据流的概要信息,最后利用微簇删除策略,定期删除过期、孤立微簇。基于真实数据集与人工数据集的实验表明:与传统基于界标模型的聚类算法相比,该算法可获得较好的效率、较小的内存开销和快速的数据处理能力。

关键词: 数据流, 聚类, 滑动窗口, 微簇, 界标模型

Abstract: Stream data clustering algorithm was improved in terms of cluster quality and efficiency. This paper adopted a new method to improve cluster quality and efficiency. Firstly, the technology of the time-based sliding window was applied. Secondly, the structure of improved micro-cluster was created to save the summary. Finally, a new strategy was designed to regularly delete expired micro-clusters and outlier micro-clusters. Compared with traditional clustering algorithms of landmark-based model, the proposed method is of better efficiency, less memory overhead and fast data processing capabilities.

Key words: data stream, clustering, sliding window, micro cluster, landmark model