基于近邻传播的分布式数据流聚类算法

doi:10.11772/j.issn.1001-9081.2013.09.2477

计算机应用 ›› 2013, Vol. 33 ›› Issue (09): 2477-2481.DOI: 10.11772/j.issn.1001-9081.2013.09.2477

基于近邻传播的分布式数据流聚类算法

张建朋¹,金鑫¹,陈福才¹,陈鸿昶²,候颖¹

1. 国家数字交换系统工程技术研究中心,郑州 450002;
2. 国家计算机网络与信息安全管理中心,北京 100031

收稿日期:2013-04-01 修回日期:2013-04-27 出版日期:2013-09-01 发布日期:2013-10-18
通讯作者: 张建朋
作者简介:张建朋 (1988-),男,河北廊坊人,博士研究生,主要研究方向:数据流挖掘;
金鑫(1982-),男,北京人,高级工程师,主要研究方向: 通信与信息系统;
陈福才(1974-),男,江西高安人,研究员,主要研究方向: 电信网信息关防及异常检测;
陈鸿昶(1964-),男,河南郑州人,教授,博士生导师, 主要研究方向: 电信网信息关防及异常检测;
侯颖(1974-),女,河北唐山人,副教授,博士,主要研究方向:网络异常检测。
基金资助:
国家863计划项目

Distributed data stream clustering algorithm based on affinity propagation

ZHANG Jianpeng¹,JIN Xin¹,CHEN Fucai¹,CHEN Hongchang²,HOU Ying¹

1. China National Digital Switching System Engineering and Technological R&D Center, Zhengzhou Henan 450002, China;
2. National Computer Network and Information Security Administration Center, Beijing 100031, China

Received:2013-04-01 Revised:2013-04-27 Online:2013-10-18 Published:2013-09-01
Contact: ZHANG Jianpeng

摘要/Abstract

摘要： 针对分布式数据流聚类算法存在的聚类质量不高、通信代价大的问题,提出了密度和代表点聚类思想相结合的分布式数据流聚类算法。该算法的局部站点采用近邻传播聚类,引入了类簇代表点的概念来描述局部分布的概要信息,全局站点采用基于改进的密度聚类算法合并局部站点上传的概要数据结构进而获得全局模型。仿真实验结果表明,所提算法能明显提高分布式环境下数据流的聚类质量,同时算法使用类簇代表点能够发现不同形状的聚簇并显著降低数据传输量。

关键词: 数据挖掘, 分布式聚类, 数据流, 近邻传播, 基于密度聚类

Abstract: As to the low clustering quality and high communication cost of the existed distributed clustering algorithm, a distributed data stream clustering algorithm (DAPDC) which combined the density with the idea of representative points clustering was proposed. The concept of the class cluster representative point to describe the local distribution of data flows was introduced in the local sites using affinity propagation clustering, while the global site got the global model by merging the summary data structure that was uploaded from the local site by the improved density clustering algorithm. The simulation results show that DAPDC can improve the clustering quality of data streams in distributed environment significantly. Simultaneously, the algorithm can find the clusters of different shapes and reduce the amount of data transferred significantly by using class cluster representative points.

Key words: data mining, distributed clustering, data stream, affinity propagation, density clustering

中图分类号:

TP181

张建朋金鑫陈福才陈鸿昶候颖. 基于近邻传播的分布式数据流聚类算法[J]. 计算机应用, 2013, 33(09): 2477-2481.

ZHANG Jianpeng JIN Xin CHEN Fucai CHEN Hongchang HOU Ying. Distributed data stream clustering algorithm based on affinity propagation[J]. Journal of Computer Applications, 2013, 33(09): 2477-2481.

[1]	尹春勇, 张帼杰. 面向分布式漂移数据流的集成分类模型[J]. 计算机应用, 2021, 41(7): 1947-1955.
[2]	刘世泽, 秦艳君, 王晨星, 苏琳, 柯其学, 罗海勇, 孙艺, 王宝会. 基于深度残差长短记忆网络交通流量预测算法[J]. 计算机应用, 2021, 41(6): 1566-1572.
[3]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.
[4]	李旭娟, 皮建勇, 黄飞翔, 贾海朋. 基于自生成深度神经网络的4D航迹预测[J]. 计算机应用, 2021, 41(5): 1492-1499.
[5]	郭帅, 苏旸. 基于数据流的加密流量分类方法[J]. 计算机应用, 2021, 41(5): 1386-1391.
[6]	陈凯, 于彦伟, 赵金东, 宋鹏. 基于城市交通监控大数据的工作位置推理方法[J]. 计算机应用, 2021, 41(1): 177-184.
[7]	戎炜, 蒋哲远, 谢昭, 吴克伟. 基于聚类关联网络的群组行为识别[J]. 计算机应用, 2020, 40(9): 2507-2513.
[8]	樊仲欣. 基于数据流的聚类趋势分析算法[J]. 计算机应用, 2020, 40(8): 2248-2254.
[9]	龙洋洋, 陈玉玲, 辛阳, 豆慧. 基于联盟区块链的安全能源交易方案[J]. 计算机应用, 2020, 40(6): 1668-1673.
[10]	徐周波, 杨健, 刘华东, 黄文文. 基于XGBoost与拓扑结构信息的蛋白质复合物识别算法[J]. 计算机应用, 2020, 40(5): 1510-1514.
[11]	杜旭升, 于炯, 叶乐乐, 陈嘉颖. 基于图上随机游走的离群点检测算法[J]. 计算机应用, 2020, 40(5): 1322-1328.
[12]	陈曦, 梅广, 张金金, 许维胜. 融合知识图谱和协同过滤的学生成绩预测方法[J]. 计算机应用, 2020, 40(2): 595-601.
[13]	马董, 陈红梅, 王丽珍, 肖清. 空间亚频繁co-location模式的主导特征挖掘[J]. 计算机应用, 2020, 40(2): 465-472.
[14]	李莎莎, 梁冬阳, 余杰, 纪斌, 马俊, 谭郁松, 吴庆波. 基于师门关系的研究团队挖掘算法[J]. 计算机应用, 2020, 40(11): 3198-3202.
[15]	孙鹤立, 张优优, 杨洲, 何亮, 贾晓琳. 基于时间线段树的城市可达区域搜索[J]. 计算机应用, 2020, 40(10): 2936-2941.

基于近邻传播的分布式数据流聚类算法

Distributed data stream clustering algorithm based on affinity propagation

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics