计算机应用 ›› 2013, Vol. 33 ›› Issue (09): 2477-2481.DOI: 10.11772/j.issn.1001-9081.2013.09.2477

• 数据库技术 • 上一篇    下一篇

基于近邻传播的分布式数据流聚类算法

张建朋1,金鑫1,陈福才1,陈鸿昶2,候颖1   

  1. 1. 国家数字交换系统工程技术研究中心,郑州 450002;
    2. 国家计算机网络与信息安全管理中心,北京 100031
  • 收稿日期:2013-04-01 修回日期:2013-04-27 出版日期:2013-09-01 发布日期:2013-10-18
  • 通讯作者: 张建朋
  • 作者简介:张建朋 (1988-),男,河北廊坊人,博士研究生,主要研究方向:数据流挖掘;
    金鑫(1982-),男,北京人,高级工程师,主要研究方向: 通信与信息系统;
    陈福才(1974-),男,江西高安人,研究员,主要研究方向: 电信网信息关防及异常检测;
    陈鸿昶(1964-),男,河南郑州人,教授,博士生导师, 主要研究方向: 电信网信息关防及异常检测;
    侯颖(1974-),女,河北唐山人,副教授,博士,主要研究方向:网络异常检测。
  • 基金资助:

    国家863计划项目

Distributed data stream clustering algorithm based on affinity propagation

ZHANG Jianpeng1,JIN Xin1,CHEN Fucai1,CHEN Hongchang2,HOU Ying1   

  1. 1. China National Digital Switching System Engineering and Technological R&D Center, Zhengzhou Henan 450002, China;
    2. National Computer Network and Information Security Administration Center, Beijing 100031, China
  • Received:2013-04-01 Revised:2013-04-27 Online:2013-10-18 Published:2013-09-01
  • Contact: ZHANG Jianpeng

摘要: 针对分布式数据流聚类算法存在的聚类质量不高、通信代价大的问题,提出了密度和代表点聚类思想相结合的分布式数据流聚类算法。该算法的局部站点采用近邻传播聚类,引入了类簇代表点的概念来描述局部分布的概要信息,全局站点采用基于改进的密度聚类算法合并局部站点上传的概要数据结构进而获得全局模型。仿真实验结果表明,所提算法能明显提高分布式环境下数据流的聚类质量,同时算法使用类簇代表点能够发现不同形状的聚簇并显著降低数据传输量。

关键词: 数据挖掘, 分布式聚类, 数据流, 近邻传播, 基于密度聚类

Abstract: As to the low clustering quality and high communication cost of the existed distributed clustering algorithm, a distributed data stream clustering algorithm (DAPDC) which combined the density with the idea of representative points clustering was proposed. The concept of the class cluster representative point to describe the local distribution of data flows was introduced in the local sites using affinity propagation clustering, while the global site got the global model by merging the summary data structure that was uploaded from the local site by the improved density clustering algorithm. The simulation results show that DAPDC can improve the clustering quality of data streams in distributed environment significantly. Simultaneously, the algorithm can find the clusters of different shapes and reduce the amount of data transferred significantly by using class cluster representative points.

Key words: data mining, distributed clustering, data stream, affinity propagation, density clustering

中图分类号: