Grid clustering algorithm based on density peaks

doi:10.11772/j.issn.1001-9081.2017.11.3080

Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (11): 3080-3084.DOI: 10.11772/j.issn.1001-9081.2017.11.3080

Previous Articles Next Articles

Grid clustering algorithm based on density peaks

YANG Jie^1,2, WANG Guoyin¹, WANG Fei¹

1. Chongqing Key Laboratory of Computational Intelligence(Chongqing University of Posts and Telecommunications), Chongqing 400065, China;
2. School of Physics and Electronics, Zunyi Normal University, Zunyi Guizhou 563002, China

Received:2017-05-16 Revised:2017-06-14 Online:2017-11-10 Published:2017-11-11
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61572091), the Chongqing Postgraduate Scientific Research and Innovation Project (CYB16106), the High-end Talent Project (RC2016005), the Key Discipline Project of Guizhou Province (QXWB[2013]18).

基于密度峰值的网格聚类算法

杨洁^1,2, 王国胤¹, 王飞¹

1. 计算智能重庆市重点实验室(重庆邮电大学), 重庆 400065;
2. 遵义师范学院物理与电子科学学院, 贵州遵义 563002

通讯作者: 王国胤
作者简介:杨洁(1987-),男,贵州遵义人,博士研究生,主要研究方向:粒计算、粗糙集、数据挖掘;王国胤(1970-),男,重庆人,教授,博士,CCF会员,主要研究方向:粒计算、软计算、认知计算;王飞(1989-),男,河南开封人,硕士研究生,主要研究方向:数据挖掘、粒计算。
基金资助:
国家自然科学基金资助项目（61572091）；重庆市研究生科研创新项目（CYB16106）；高端人才项目（RC2016005）；贵州省级重点学科（黔学位办[2013]18号）。

Abstract

Abstract: The Density Peak Clustering (DPC) algorithm which required few parameters and no iteration was proposed in 2014, it was simple and novel. In this paper, a grid clustering algorithm which could efficiently deal with large-scale data was proposed based on DPC. Firstly, the N dimensional space was divided into disjoint rectangular units, and the unit space information was counted. Then the central cells of space was found based on DPC, namely, the central cells were surrounded by other grid cells of low local density, and the distance with grid cells of high local density was relatively large. Finally, the grid cells adjacent to their central cells were merged to obtain the clustering results. The experimental results on UCI artificial data set show that the proposed algorithm can quickly find the clustering centers, and effectively deal with the clustering problem of large-scale data, which has a higher efficiency compared with the original density peak clustering algorithm on different data sets, reducing the loss of time 10 to 100 times, and maintaining the loss of accuracy at 5% to 8%.

Key words: density peak, grid granulation, large-scale data, clustering

摘要： 2014年提出的密度峰值聚类算法，思想简洁新颖，所需参数少，不需要进行迭代求解，而且具有可扩展性。基于密度峰值聚类算法提出了一种网格聚类算法，能够高效地对大规模数据进行处理。首先，将N维空间粒化为不相交的长方形网格单元；然后，统计单元空间的信息，利用密度峰值聚类寻找中心点的思想确定中心单元，即中心网格单元被一些低局部密度的数据单元包围，而且与比自身局部密度高的网格单元的距离相对较大；最后，合并与中心网格单元相近网格单元，从而得出聚类结果。在UCI人工数据集上的仿真实验结果表明，所提算法能够较快得出聚类中心，有效处理大规模数据的聚类问题，具有较高的效率，与原始的密度峰值聚类算法相比，在不同数据集上时间损耗降低至原来的1/100~1/10，而精度损失维持在5%~8%。

关键词: 密度峰值, 网格粒化, 大规模数据, 聚类

CLC Number:

TP311

YANG Jie, WANG Guoyin, WANG Fei. Grid clustering algorithm based on density peaks[J]. Journal of Computer Applications, 2017, 37(11): 3080-3084.

杨洁, 王国胤, 王飞. 基于密度峰值的网格聚类算法[J]. 计算机应用, 2017, 37(11): 3080-3084.

References

[1] KITAMOTO A. Data mining for typhoon image collection[C]//Proceedings of the 2nd International Workshop on Multimedia Data Mining. New York:ACM, 2002:68-77.
[2] BELLAZZI R, ZUPAN B. Predictive data mining in clinical medicine:current issues and guidelines[J]. International Journal of Medical Informatics, 2008, 77(2):81-97.
[3] MATTHEWS B, DAS S, BHADURI K, et al. Discovering anomalous aviation safety events using scalable data mining algorithms[J]. Journal of Aerospace Computing Information and Communication, 2014, 11(7):482-482.
[4] TAKIZAWA H, KOBAYASHI H. Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing[J]. Journal of Supercomputing, 2006, 36(3):219-234.
[5] CUI X, CHARLES J S, POTOK T E. The GPU enhanced parallel computing for large scale data clustering[J]. Future Generation Computer Systems, 2013, 29(7):1736-1741.
[6] LI Y, YANG G, HE H, et al. A study of large-scale data clustering based on fuzzy clustering[J]. Soft Computing, 2016, 20(8):3231-3242.
[7] WANG W, YANG J, MUNTZ R R. STING:a statistical information grid approach to spatial data mining[C]//Proceedings of the 23rd International Conference on Very Large Data Bases. San Francisco, CA:Morgan Kaufmann, 1997:186-195.
[8] CHEN L, YU T, CHIRKOVA R. Wave cluster with differential privacy[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. New York:ACM, 2015:1011-1020.
[9] DUAN D, LI Y, LI R, et al. Incremental K-clique clustering in dynamic social networks[J]. Artificial Intelligence Review, 2012, 38(2):129-147.
[10] WANG W, YANG J, MUNTZ R. STING⁺:an approach to active spatial data mining[C]//Proceedings of the 15th International Conference on Data Engineering. Washington, DC:IEEE Computer Society, 1999:116.
[11] ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH:an efficient data clustering method for very large databases[J]. ACM SIGMOD Record, 1996, 25(2):103-114.
[12] HODGE V J, AUSTIN J. A survey of outlier detection methodologies[J]. Artificial Intelligence Review, 2004, 22(2):85-126.
[13] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J].Science, 2014, 344(6191):1492-1496.
[14] PARK H, JUN C. A simple and fast algorithm for K-medoids clustering[J]. Expert Systems with Applications, 2009, 36(2):3336-3341.
[15] 马箐, 谢娟英. 基于粒计算的K-medoids聚类算法[J]. 计算机应用, 2012, 32(7):1973-1977.(MA Q, XIE J Y. New K-medoids clustering algorithm based on granular computing[J]. Journal of Computer Applications, 2012, 32(7):1973-1977.)
[16] 张雪萍, 龚康莉, 赵广才. 基于MapReduce的K-Medoids并行算法[J]. 计算机应用, 2013, 33(4):1023-1025. (ZHANG X P, GONG K L, ZHAO G C. Parallel K-Medoids algorithm based on MapReduce[J]. Journal of Computer Applications, 2013, 33(4):1023-1025.)
[17] ESTER B, KRIEGEL H, SANDER J, et al. A density based algorithm for discovering clusters in large spatial databases[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Menlo Park, CA:AAAI Press, 1996:226-231.
[18] 周水庚, 周傲英, 金文,等. FDBSCAN:一种快速DBSCAN算法[J]. 软件学报, 2000, 15(6):735-744.(ZHOU S G, ZHOU A Y, JIN W, et al. FDBSCAN:a fast DBSCAN algorithm[J]. Journal of Software, 2000, 15(6):735-744.)
[19] SARAGIH J, LUCEY S, COHN J F. Deformable model fitting by regularized landmark mean-shift[J]. International Journal of Computer Vision, 2011, 91(2):200-215.

Grid clustering algorithm based on density peaks

基于密度峰值的网格聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	CHEN Hengheng, NI Zhiwei, ZHU Xuhui, JIN Yuanyuan, CHEN Qian. Differential privacy high-dimensional data publishing method via clustering analysis [J]. Journal of Computer Applications, 2021, 41(9): 2578-2585.
[2]	ZHU Cheng, ZHAO Xiaoqi, ZHAO Liping, JIAO Yuhong, ZHU Yafei, CHENG Jianying, ZHOU Wei, TAN Ying. Classification of functional magnetic resonance imaging data based on semi-supervised feature selection by spectral clustering [J]. Journal of Computer Applications, 2021, 41(8): 2288-2293.
[3]	ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329.
[4]	WANG Jiarui, TAN Guoping, ZHOU Siyuan. Clustered wireless federated learning algorithm in high-speed internet of vehicles scenes [J]. Journal of Computer Applications, 2021, 41(6): 1546-1550.
[5]	DAI Yanran, DAI Guoqing, YUAN Yubo. Multi-face foreground extraction method based on skin color learning [J]. Journal of Computer Applications, 2021, 41(6): 1659-1666.
[6]	LI Guorong, YE Jimin, ZHEN Yuanting. Time series clustering based on new robust similarity measure [J]. Journal of Computer Applications, 2021, 41(5): 1343-1347.
[7]	WANG Zhihe, CHANG Xiaoqing, DU Hui. Adaptive affinity propagation clustering algorithm based on universal gravitation [J]. Journal of Computer Applications, 2021, 41(5): 1337-1342.
[8]	MA Jianhong, CAO Wenbin, LIU Yuangang, XIA Shuang. Patent clustering method based on functional effect [J]. Journal of Computer Applications, 2021, 41(5): 1361-1366.
[9]	LONG Chaoqi, JIANG Yu, XIE Yu. Improved wavelet clustering algorithm based on peak grid [J]. Journal of Computer Applications, 2021, 41(4): 1122-1127.
[10]	LI Xingfeng, HUANG Yuqing, REN Zhenwen, LI Yihong. Robust multi-view clustering algorithm based on adaptive neighborhood [J]. Journal of Computer Applications, 2021, 41(4): 1093-1099.
[11]	GUO Jia, HAN Litao, SUN Xianlong, ZHOU Lijuan. Comparative density peaks clustering algorithm with automatic determination of clustering center [J]. Journal of Computer Applications, 2021, 41(3): 738-744.
[12]	ZOU Zhiwen, QIN Cheng. Method of dynamically constructing spatial topic R-tree based on k-means++ [J]. Journal of Computer Applications, 2021, 41(3): 733-737.
[13]	LYU Jia, XIAN Yan. Co-training algorithm combining improved density peak clustering and shared subspace [J]. Journal of Computer Applications, 2021, 41(3): 686-693.
[14]	ZHANG En, LI Huimin, CHANG Jian. Verifiable k-means clustering scheme with privacy-preserving [J]. Journal of Computer Applications, 2021, 41(2): 413-421.
[15]	YUAN Qianqian, DENG Hongmin, WANG Xiaohang. Citrus disease and insect pest area segmentation based on superpixel fast fuzzy C-means clustering and support vector machine [J]. Journal of Computer Applications, 2021, 41(2): 563-570.