基于遗传算法与密度及距离计算的聚类方法

doi:10.11772/j.issn.1001-9081.2015.11.3243

计算机应用 ›› 2015, Vol. 35 ›› Issue (11): 3243-3246.DOI: 10.11772/j.issn.1001-9081.2015.11.3243

基于遗传算法与密度及距离计算的聚类方法

王泽, 张宏军, 张睿, 贺邓超

解放军理工大学指挥信息系统学院, 南京 210007

收稿日期:2015-06-15 修回日期:2015-08-10 发布日期:2015-11-13
通讯作者: 王泽(1991-),男,湖南长沙人,硕士研究生,主要研究方向:模式识别、数据挖掘.
作者简介:张宏军(1963-),男,江苏泰州人,教授,博士生导师,博士,主要研究方向:数据挖掘、知识工程; 张睿(1977-),男,山东文登人,副教授,博士,主要研究方向:数据挖掘、军事建模; 贺邓超(1989-),男,江苏南京人,博士研究生,主要研究方向:机器学习、数据挖掘.
基金资助:
国家社会科学基金资助项目(13QJ004-098);江苏省自然科学基金资助项目(BK20150720).

Clustering by density and distance analysis based on genetic algorithm

WANG Ze, ZHANG Hongjun, ZHANG Rui, HE Dengchao

Institute of Command and Information System, PLA University of Science and Technology, Nanjing Jiangsu 210007, China

Received:2015-06-15 Revised:2015-08-10 Published:2015-11-13

摘要/Abstract

摘要： 为解决聚类中心选择困难和数据点密度计算泛化能力弱的问题,提出一种基于遗传算法与密度及距离计算的聚类方法.该算法通过指数方法计算数据点密度,降低参数对算法性能的影响;用遗传算法搜索最优密度和距离阈值,同时引入惩罚因子,克服算法搜索域偏移从而提高收敛速度,寻找最优聚类中心,并用归属方法完成聚类.通过4组人工数据集和4组UCI数据集实验证明,该方法在RI指数、聚类精度、聚类纯度、召回率等4个聚类评价指标上都达到与K-means算法、快速搜索聚类算法和Max_Min_SD算法相当或更好的效果,算法是有效的.

关键词: 遗传算法, 聚类, 密度, 距离

Abstract: In order to solve the difficulty of selecting cluster centers and weakness of density analysis generalization, a novel clustering method was proposed. The method completed clustering by density and distance analysis based on genetic algorithm, which computed density with exponential method to reduce the impact of parameters and adopted genetic algorithm to search optimum threshold values. It introduced a penalty factor to overcome the excursion of search region for accelerating convergence. Numerical experiments on both artificial and UCI data sets show that compared with K-means, fast search clustering and Max_Min_SD, the proposed algorithm can achieve better or comparable performance on Rand index, accuracy, precision and recall.

Key words: genetic algorithm, clustering, density, distance

中图分类号:

TP391.4

王泽, 张宏军, 张睿, 贺邓超. 基于遗传算法与密度及距离计算的聚类方法[J]. 计算机应用, 2015, 35(11): 3243-3246.

WANG Ze, ZHANG Hongjun, ZHANG Rui, HE Dengchao. Clustering by density and distance analysis based on genetic algorithm[J]. Journal of Computer Applications, 2015, 35(11): 3243-3246.

参考文献

[1] SUN J, LIU J, ZHAO L. Clustering algorithms research[J].Journal of Software,2008,18(1):48-61.(孙吉贵, 刘杰, 赵连宇.聚类算法研究[J]. 软件学报, 2008, 19(1):48-61.)
[2] JAIN A K, DUBES R C. Algorithms for clustering data[M]. Englewood Cliffs: Prentice Hall, 1988:1-334.
[3] JAIN A K. Data clustering: 50 years beyond K-means[J]. Pattern Recognition Letters, 2010, 31(8): 651-666.
[4] BIRANT D, KUT A. ST-DBSCAN: an algorithm for clustering spatial-temporal data[J]. Data and Knowledge Engineering, 2007, 60(1): 208-221.
[5] FREY B J, DUECK D. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976.
[6] HIGGS R E, BEMIS K G, WATSON I A, et al. Experimental designs for selecting molecules from large chemical databases[J]. Journal of Chemical Information and Computer Sciences, 1997, 37(5):861-870.
[7] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191): 1492-1496.
[8] YANG Y, JIN F, KAMEL M. Survey of clustering validity evaluation[J]. Application Research of Computers,2008,25(6):1630-1632.(杨燕, 靳蕃, KAMEL M. 聚类有效性评价综述[J]. 计算机应用研究, 2008, 25(6): 1630-1632.)
[9] HALKIDI M, BATISTAKIS Y, VAZIRGIANNIS M. On clustering validation techniques[J]. Journal of Intelligent Information Systems, 2001, 17(2/3):107-145.
[10] XI Y,CHAI T, YUN W. Summary of the genetic algorithm[J].Control Theory and Applications,1996,13(6):697-708.(席裕庚, 柴天佑, 恽为民. 遗传算法综述[J]. 控制理论与应用, 1996,13(6):697-708.)
[11] MA Y, YUN W. Research progress of genetic algorithm[J]. Application Research of Computers,2012,29(4):1201-1206.(马永杰, 云文霞. 遗传算法研究进展[J]. 计算机应用研究, 2012, 29(4):1201-1206.)
[12] FAN Q,WANG P,GAO X .Improved genetic algorithm based on oriented crossover[J].Control and Decision,2009,24(4):542-546. (范青武, 王普, 高学金. 一种基于有向交叉的遗传算法[J]. 控制与决策, 2009, 24(4):542-546.)
[13] LIANG J, BAI L, DANG C, et al.The-means-type algorithms versus imbalanced data distributions[J]. IEEE Transactions on Fuzzy Systems, 2012, 20(4):728-745.
[14] CAO F, WU P. Initial cluster centers choice algorithm based on the sparsity and distance[J].Journal of Shanxi University:Natural Science Edition,2015,38(1):73-78. (曹付元, 武鹏鹏. 一种基于稀疏度和距离的初始类中心选择算法[J]. 山西大学学报:自然科学版, 2015, 38(1):73-78.)

[1]	陈恒恒, 倪志伟, 朱旭辉, 金媛媛, 陈千. 基于聚类分析的差分隐私高维数据发布方法[J]. 计算机应用, 2021, 41(9): 2578-2585.
[2]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[3]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[4]	张闻强, 邢征, 杨卫东. 基于多区域采样策略的混合粒子群优化求解多目标柔性作业车间调度问题[J]. 计算机应用, 2021, 41(8): 2249-2257.
[5]	张盟, 郭健全. 需求和回收不确定的闭环供应链渠道结构选择[J]. 计算机应用, 2021, 41(7): 2100-2107.
[6]	杨粟, 欧阳智, 杜逆索. 基于相关度距离的无监督并行哈希图像检索[J]. 计算机应用, 2021, 41(7): 1902-1907.
[7]	黄少伟, 黄婉琳, 雷闰龙, 毛雪松. 基于脉冲位置幅度调制的距离速度同时测量[J]. 计算机应用, 2021, 41(7): 2145-2149.
[8]	杨震, 马健霄, 王宝杰. 设置待行区条件下双环相位信号配时优化模型[J]. 计算机应用, 2021, 41(7): 2108-2112.
[9]	李进, 王凤, 杨沈宇. 换电模式下电动车货运路径优化模型与算法[J]. 计算机应用, 2021, 41(6): 1792-1798.
[10]	戴嫣然, 戴国庆, 袁玉波. 基于肤色学习的多人脸前景抽取方法[J]. 计算机应用, 2021, 41(6): 1659-1666.
[11]	张豪, 朱睿, 宋栿尧, 方鹏, 夏秀峰. 距离-关键字相似度约束的双色反k近邻查询方法[J]. 计算机应用, 2021, 41(6): 1686-1693.
[12]	马建红, 曹文斌, 刘元刚, 夏爽. 基于功效特征的专利聚类方法[J]. 计算机应用, 2021, 41(5): 1361-1366.
[13]	甘岚, 沈鸿飞, 王瑶, 张跃进. 基于改进DCGAN的数据增强方法[J]. 计算机应用, 2021, 41(5): 1305-1313.
[14]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.
[15]	李舒仪, 韩晓龙. 海铁联运港口混合作业模式下轨道吊与集卡协同调度[J]. 计算机应用, 2021, 41(5): 1506-1513.

基于遗传算法与密度及距离计算的聚类方法

Clustering by density and distance analysis based on genetic algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics