基于随机取样的选择性K-means聚类融合算法

doi:10.11772/j.issn.1001-9081.2013.07.1969

计算机应用 ›› 2013, Vol. 33 ›› Issue (07): 1969-1972.DOI: 10.11772/j.issn.1001-9081.2013.07.1969

基于随机取样的选择性K-means聚类融合算法

王丽娟¹,郝志峰¹,²,蔡瑞初²,温雯²

1. 华南理工大学计算机科学与工程学院，广州 51006
2. 广东工业大学计算机学院，广州 510006

收稿日期:2013-03-28 修回日期:2013-05-10 出版日期:2013-07-01 发布日期:2013-07-06
通讯作者: 王丽娟
作者简介:王丽娟(1978-),女,河北邢台人,博士,讲师,主要研究方向:机器学习、数据挖掘;郝志峰(1968-),男，江苏苏州人,教授,博士,主要研究方向:机器学习、进化计算;蔡瑞初(1983-),男,浙江温州人,副教授,博士,主要研究方向:机器学习、生物信息学;温雯(1981-),女,江西赣州人,副教授,博士,主要研究方向:机器学习、图像识别。
基金资助:
国家自然科学基金资助项目(61070033,61100148, 61202269);广东省自然科学基金资助项目(S20110400 04804);广东省科技计划项目 (2010B050400011);软件新技术国家重点实验室开放课题 (KFKT2011B19);广东高校优秀青年创新人才培育项目 (LYM11060);广州市科技计划项目 (12C42111607,201200000031);番禺区科技计划项目 (2012-Z-03-67)

Selective K-means clustering ensemble based on random sampling

WANG Lijuan¹,HAO Zhifeng¹,²,CAI Ruichu²,WEN Wen²

1. School of Computer Science and Engineering, South China University of Technology, Guangzhou Guangdong 510006, China
2. Faculty of Computer, Guangdong University of Technology, Guangzhou Guangdong 510006, China

Received:2013-03-28 Revised:2013-05-10 Online:2013-07-06 Published:2013-07-01
Contact: WANG Lijuan

摘要/Abstract

摘要： 由于缺少数据分布、参数和数据类别标记的先验信息，部分基聚类的正确性无法保证，进而影响聚类融合的性能；而且不同基聚类决策对于聚类融合的贡献程度不同，同等对待基聚类决策，将影响聚类融合结果的提升。为解决此问题，提出了基于随机取样的选择性K-means聚类融合算法（RS-KMCE）。该算法中的随机取样策略可以避免基聚类决策选取陷入局部极小，而且依据多样性和正确性定义的综合评价值，有利于算法快速收敛到较优的基聚类子集，提升融合性能。通过2个仿真数据库和4个UCI数据库的实验结果显示：RS-KMCE的聚类性能优于K-means算法、K-means融合算法（KMCE）以及基于Bagging的选择性K-means聚类融合（BA-KMCE）。

关键词: 聚类融合, 选择性聚类融合, 随机取样, 聚类决策评价, K-means

Abstract: Without any prior information about data distribution, parameter and the labels of data, not all base clustering results can truly benefit for the combination decision of clustering ensemble. In addition, if each base clustering plays the same role, the performance of clustering ensemble may be weakened. This paper proposed a selective K-means clustering ensemble based on random sampling, called RS-KMCE. In RS-MKCE, random sampling can avoid local minimum in the process of selecting base clustering subset for ensemble. And the defined evaluation index according to diversity and accuracy can lead to a better base clustering subset for improving the performance of clustering ensemble. The experiment results on two synthetic datasets and four UCI datasets show that performance of the proposed RS-KMCE is better than K-means, K-means clustering ensemble, and selective K-means clustering ensemble based on bagging.

Key words: clustering ensemble, selective clustering ensemble, random sampling, evaluation index of clustering, K-means

中图分类号:

TP181

王丽娟郝志峰蔡瑞初温雯. 基于随机取样的选择性K-means聚类融合算法[J]. 计算机应用, 2013, 33(07): 1969-1972.

WANG Lijuan HAO Zhifeng CAI Ruichu WEN Wen. Selective K-means clustering ensemble based on random sampling[J]. Journal of Computer Applications, 2013, 33(07): 1969-1972.

参考文献

［1］STREHL A, GHOSH J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions ［J］. Journal of Machine Learning Research, 2002, 3：583-617.［2］FERN X Z, BRODLEY E C. Cluster ensembles for high dimensional data clustering: an empirical study, #CS06-30-02［R］. Corvallis, USA: Oregon State University, 2004.

［3］WANG T. CA-Tree: a hierarchical structure for efficient and scala-ble coassociation-based cluster ensembles ［J］. IEEE Transactions on Systems, Man, and Cybernetics—Part B, 2011, 41(3): 686-698.

［4］IAM-ON N, BOONGOEN T, GARRETT S, et al. A link-based approach to the cluster ensemble problem ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(12): 2396-2409.

［5］FRED A L N, JAIN A K. Combining multiple clusterings using evidence accumulation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(6): 835-850.

［6］GIONIS A, MANNILA H, TSAPARAS P. Clustering aggregation ［C］// ICDE '05: Proceeding of 2005 IEEE International Conference on Data Engineering. Piscataway: IEEE, 2005: 341-352.

［7］KUNCHEVA L I, VETROV D P. Evaluation of stability of k-means cluster ensembles with respect to random initialization ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(11): 1798-1808.

［8］FISCHER B, BUHMANN J M. Bagging for path-based clustering ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(11): 1411-1415.

［9］MINAEI-BIDGOLI B, TOPCHY A, PUNCH W. A comparison of resampling methods for clustering ensembles ［C］// Proceeding of 2004 International Conference on Machine Learning: Models, Technologies, and Applications. Las Vegas: CSREA Press, 2004: 939-945.

［10］FERN X Z, BRODLEY E C. Random projection for high dimensional data clustering: a cluster ensemble approach ［C］// ICML 2003: Proceedings of the 20th International Conference on Machine learning. Washington, DC: AAAI Press, 2003: 186-193.

［11］TOPCHY A, JAIN A K, PUNCH W. Clustering ensembles: models of consensus and weak partitions ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(12): 1866-1881.

［12］HONG Y, KWONG S, WANG H L, et al. Resampling-based selective clustering ensembles ［J］. Pattern Recognition Letters, 2009, 30(3): 298-305.

［13］ZHOU Z H, TAN W. Clusterer ensemble ［J］. Knowledge-Based Systems, 2006, 19(1): 77-83.

［14］TUMER K, AGOGINO A K. Ensemble clustering with voting active clusters ［J］. Pattern Recognition Letters, 2008, 29(14): 1947-1953.

［15］JIA J H, XIAO X, LIU B X, et al. Bagging-based spectral clustering ensemble selection ［J］. Pattern Recognition Letters, 2011, 32(10): 1456-1467.

[1]	任杰, 闵帆, 汪敏. 基于最远总距离采样的代价敏感主动学习[J]. 计算机应用, 2019, 39(9): 2499-2504.
[2]	徐占洋, 郑克长. 云计算下基于改进遗传算法的聚类融合算法[J]. 计算机应用, 2018, 38(2): 458-463.
[3]	杨辉华, 王克, 李灵巧, 魏文, 何胜韬. 基于自适应布谷鸟搜索算法的K-means聚类算法及其应用[J]. 计算机应用, 2016, 36(8): 2066-2070.
[4]	王智文, 蒋联源, 王宇航, 王日凤, 张灿龙, 黄镇谨, 王鹏涛. 基于尺度自适应局部时空特征的足球比赛视频中的多运动员行为表示[J]. 计算机应用, 2016, 36(8): 2134-2138.
[5]	吴洁璇, 陈振杰, 张云倩, 骈宇哲, 周琛. 多核CPU下的K-means遥感影像分类并行方法[J]. 计算机应用, 2015, 35(5): 1296-1301.
[6]	林荣强李青李鸥李林林. 基于类标记扩展的半监督网络流量特征选择算法[J]. 计算机应用, 2014, 34(11): 3206-3209.
[7]	曹永春蔡正琦邵亚斌. 基于K-means的改进人工蜂群聚类算法[J]. 计算机应用, 2014, 34(1): 204-207.
[8]	王春龙张敬旭. 基于LDA的改进K-means算法在文本聚类中的应用[J]. 计算机应用, 2014, 34(1): 249-254.
[9]	江浩陈兴蜀杜敏. 基于主题聚簇评价的论坛热点话题挖掘[J]. 计算机应用, 2013, 33(11): 3071-3075.
[10]	罗彪闫维维万亮. 基于ANP和K-means聚类的客户价值分类模型及应用[J]. 计算机应用, 2013, 33(10): 2954-2959.
[11]	洪留荣. 无需设定阈值的图像边缘检测[J]. 计算机应用, 2013, 33(08): 2330-2333.
[12]	张利伟苑津莎. 基于智能互补策略的免疫算法[J]. 计算机应用, 2013, 33(04): 953-956.
[13]	岑梓源李彬田联房. 基于K-Means++聚类的体绘制高维传递函数设计方法[J]. 计算机应用, 2012, 32(12): 3404-3407.
[14]	王留正何振峰. 基于全局性分裂算子的进化K-means算法[J]. 计算机应用, 2012, 32(11): 3005-3008.
[15]	叶龙欢王俊峰高琳袁军. 复杂背景下的票据字符分割方法[J]. 计算机应用, 2012, 32(11): 3198-3205.

基于随机取样的选择性K-means聚类融合算法

Selective K-means clustering ensemble based on random sampling

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics