计算机应用 ›› 2015, Vol. 35 ›› Issue (7): 1927-1932.DOI: 10.11772/j.issn.1001-9081.2015.07.1927

• 数据技术 • 上一篇    下一篇

基于近邻传播与密度相融合的进化数据流聚类算法

邢长征, 刘剑   

  1. 辽宁工程技术大学 研究生院, 辽宁 兴城 125105
  • 收稿日期:2015-01-15 修回日期:2015-03-25 出版日期:2015-07-10 发布日期:2015-07-17
  • 通讯作者: 刘剑(1990-),男,湖南衡阳人,硕士研究生,主要研究方向:数据挖掘、数据库,954443316@qq.com
  • 作者简介:邢长征(1967-),男,辽宁阜新人,教授,博士,主要研究方向:数据挖掘、数据库
  • 基金资助:

    国家自然科学基金资助项目(61402212)。

Evolutionary data stream clustering algorithm based on integration of affinity propagation and density

XING Changzheng, LIU Jian   

  1. Graduate School, Liaoning Technical University, Xingcheng Liaoning 125105, China
  • Received:2015-01-15 Revised:2015-03-25 Online:2015-07-10 Published:2015-07-17

摘要:

针对目前数据流离群点不能很好地被处理、数据流聚类效率较低以及对数据流的动态变化不能实时检测等问题,提出一种基于近邻传播与密度相融合的进化数据流聚类算法(I-APDenStream)。此算法使用传统的两阶段处理模型,即在线与离线聚类两部分。不仅引进了能够体现数据流动态变化的微簇衰减密度以及在线动态维护微簇的删减机制,而且在对模型采用扩展的加权近邻传播(WAP)聚类进行模型重建时,还引进了异常点检测删除机制。通过在两种类型数据集上的实验结果表明,所提算法的聚类准确率基本能保持在95%以上,其纯度对比实验等其他相关测试都有较好结果,能够高实效、高质量、高效率地处理数据流数据聚类。

关键词: 离群点, 数据流聚类, 近邻传播, 微簇

Abstract:

To solve the problems that the data stream outliers can not be disposed well, the efficiency of clustering data stream is low and the dynamic changes of data stream can not be real-time detected, an evolutionary data stream clustering algorithm based on integration of affinity propagation and density (I-APDenStream)was proposed. The traditional two-stage processing model was used in this algorithm, namely online and offline clustering. Not only the decay density of micro-cluster which could represent the dynamic changes of data stream and deletion mechanism for online dynamic maintenance of micro-cluster were introduced, but also the outliers' detection and simplification mechanism for model reconstruction by using the extended Weight Affinity Propagation (WAP) cluster was introduced. The experimental results on two types of data sets demonstrate that the cluster accuracy of the proposed algorithm remains at above 95%, and also achieves considerable improvements with respect to the purity compared to other algorithms. The proposed algorithm can cluster the data stream with high real-time, high quality and high efficiency.

Key words: outlier, data stream clustering, Affinity Propagation (AP), micro-cluster

中图分类号: