基于样本空间分布密度的改进次胜者受罚竞争学习算法

doi:10.3724/SP.J.1087.2012.00638

计算机应用 ›› 2012, Vol. 32 ›› Issue (03): 638-642.DOI: 10.3724/SP.J.1087.2012.00638

基于样本空间分布密度的改进次胜者受罚竞争学习算法

谢娟英1,2,郭文娟1,谢维信2,3,高新波2

1.陕西师范大学计算机科学学院,西安 710062;
2.西安电子科技大学电子工程学院,西安 710071;
3.深圳大学信息工程学院,广东深圳 518060

收稿日期:2011-09-01 修回日期:2011-11-24 发布日期:2012-03-01 出版日期:2012-03-01
通讯作者: 谢娟英
作者简介:谢娟英(1971-),女,陕西西安人,副教授,CCF会员,主要研究方向:智能信息处理、模式识别、机器学习、数据挖掘;郭文娟(1986-),女,甘肃武威人,硕士研究生,主要研究方向:智能信息处理、模式识别;谢维信(1941-),男,广东广州人,教授,博士生导师,主要研究方向:智能信息处理、目标识别、智能人机交互、图像处理、模式识别;高新波(1972-),男,山东莱芜人,教授,博士生导师,主要研究方向:机器学习、计算智能、视觉信息、无线通信。
基金资助:
中央高校基本科研业务费专项资金资助项目(GK200901006, GK201001003);陕西省自然科学基础研究计划项目(2010JM3004)。

Improvement rival penalized competitive learning algorithm based on pattern distribution of samples

XIE Juan-ying^1,2, GUO Wen-juan¹, XIE Wei-xin^2,3, GAO Xin-bo²

1.School of Computer Science, Shaanxi Normal University, Xi'an Shaanxi 710062, China;
2.School of Electronic Engineering, Xidian University, Xi'an Shaanxi 710071, China; 3.School of Information Engineering, Shenzhen University, Shenzhen Guangdong 518060, China

Received:2011-09-01 Revised:2011-11-24 Online:2012-03-01 Published:2012-03-01
Contact: XIE Juan-ying

摘要/Abstract

摘要： 针对传统次胜者受罚竞争学习(RPCL)算法忽略数据集几何结构对节点权值调整的影响,以及魏立梅等提出的新RPCL算法(魏立梅,谢维信.聚类分析中竞争学习的一种新算法.电子科学学刊,2000,22(1):13-18)引入密度来对节点的权值进行调整时,密度定义的主观性,提出基于样本空间分布密度的改进RPCL算法。该算法根据数据集样本自然分布定义样本密度,将此密度引入RPCL节点权值调整;使用UCI机器学习数据库数据集以及随机生成的带有噪声点的人工模拟数据集对算法进行实验测试,对算法确定数据集类簇数目的准确率、运行时间、聚类误差平方和、聚类结果的Rand指数、Jaccard系数以及Adjust Rand index参数进行分析比较。各项实验结果显示:所提算法优于原始RPCL算法和魏立梅算法,具有更好的聚类效果,对噪声数据有很强的抗干扰性能。所提算法不仅能根据样本的自然分布确定数据集的合理类簇数目,而且能确定合适的类簇中心,提高聚类的准确性,使聚类结果尽可能快地收敛到全局最优解。

关键词: 聚类, 次胜者受罚竞争学习算法, 样本密度, 聚类数目, 聚类中心

Abstract: The original Rival Penalized Competitive Learning (RPCL) algorithm ignores the influence of the geometry structure of a dataset on the weight variation of its nodes. A new RPCL algorithm proposed by Wei Limei et al. (WEI LIMEI, XIE WEIXIN. A new competitive learning algorithm for clustering analysis. Journal of Electronics, 2000, 22(1): 13-18) overcame the drawback of the original RPCL by introducing the density of samples to adjust the weights of nodes, while the density was not much objective. This paper defined a new density for a sample according to the pattern distribution of samples in a dataset, and introduced the density into the adjusting for the weights of nodes in RPCL to overcome the disadvantages of the available RPCL algorithms. The authors' improved RPCL algorithm was tested on some well-known datasets from UCI machine learning repository and on some synthetic data sets with noisy samples. The accuracy of determining the number of clusters of a dataset and the run time and the clustering error of the algorithms were compared. The Rand index, the Jaccard coefficient and the Adjust Rand index were used to analyze the performance of the algorithms. The experimental results show that the improved RPCL algorithm outperforms the original RPCL and the new RPCL proposed by WEI LIMEI et al. greatly, and achieves much better clustering results and has a stronger anti-interference performance for noisy data than that of the other two RPCL algorithms. All the analyses demonstrate that the improved RPCL algorithm can not only determine the right number of clusters for a dataset according to its sample distribution, but also uncover the suitable centers of clusters and advance the clustering accuracy as well as approximate the global optimal clustering result as fast as possible.

Key words: clustering, Rival Penalized Competitive Learning (RPCL) algorithm, sample density, cluster number, cluster center

中图分类号:

TP18
TP301.6

谢娟英郭文娟谢维信高新波. 基于样本空间分布密度的改进次胜者受罚竞争学习算法[J]. 计算机应用, 2012, 32(03): 638-642.

XIE Juan-ying GUO Wen-juan XIE Wei-xin GAO Xin-bo. Improvement rival penalized competitive learning algorithm based on pattern distribution of samples[J]. Journal of Computer Applications, 2012, 32(03): 638-642.

参考文献

[1]孙吉贵,刘杰,赵连宇. 聚类算法研究[J]. 软件学报, 2008, 19(1):48-61.

[2]HAN J W,KAMBER M.数据挖掘概念与技术[M]. 范明, 孟小峰,译. 北京:机械工业出版社, 2000.

[3]JAIN A K, DUBES R C. Algorithms for clustering data[M]. Upper Saddle River, NJ: Prentice Hall, 1988: 1-334.

[4]XU L, KRZYZAK A, OJA E. Rival penalized competitive learning for clustering analysis [J]. IEEE Transactions on Neural Networks, 1993, 4(4): 636-649.

[5]李听,郑宇,江芳泽.用改进的RPCL算法提取聚类的最佳数目[J].上海大学学报,1999,40(8):120-122.

[6]魏立梅, 谢维信. 聚类分析中竞争学习的一种新算法[J]. 电子科学学刊, 2000, 22(1):13-18.

[7]张忠平,王爱杰,柴旭光.简单有效的确定聚类数目算法[J].计算机工程与应用,2009,45(15):166-168.

[8]张惟皎,刘春煌,李芳玉.聚类质量的评价方法[J].计算机工程,2005,31(20):10-12.

[9]于剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围[J].中国科学:E辑,2002,32(2):274-280.

[10]王开军,李健,张军英,等.聚类分析中类数估计方法的实验比较[J].计算机工程,2008,34(9):198-199.

[11]杨善林,李永森,胡笑旋,等.K-means算法中的k值优化问题研究[J].系统工程理论与实践,2006(2):97-101.

[12]PARK H S, JUN C H. A simple and fast algorithm for K-medoids clustering [J]. Expert Systems with Applications, 2009, 36(2): 3336-3341.

[13]杨燕, 靳蕃, KAMEL M. 聚类有效性评价综述[J].计算机应用研究, 2008, 25(6): 1631-1632.

[14]HUBERT L, ARABIE P. Comparing partitions [J]. Journal of Classification, 1985, 2(1):193-218.

[15]VINH N X, EPPS J, NAILEY J. Information theoretic measures for clusterings comparison: Is a correction for chance necessary? [C]// Proceedings of the 26th International Conference on Machine Learning. New York: ACM Press, 2009: 1073-1080.

基于样本空间分布密度的改进次胜者受罚竞争学习算法

Improvement rival penalized competitive learning algorithm based on pattern distribution of samples

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈恒恒, 倪志伟, 朱旭辉, 金媛媛, 陈千. 基于聚类分析的差分隐私高维数据发布方法[J]. 计算机应用, 2021, 41(9): 2578-2585.
[2]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[3]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[4]	戴嫣然, 戴国庆, 袁玉波. 基于肤色学习的多人脸前景抽取方法[J]. 计算机应用, 2021, 41(6): 1659-1666.
[5]	马建红, 曹文斌, 刘元刚, 夏爽. 基于功效特征的专利聚类方法[J]. 计算机应用, 2021, 41(5): 1361-1366.
[6]	李国荣, 冶继民, 甄远婷. 基于新的鲁棒相似性度量的时间序列聚类[J]. 计算机应用, 2021, 41(5): 1343-1347.
[7]	王治和, 常筱卿, 杜辉. 基于万有引力的自适应近邻传播聚类算法[J]. 计算机应用, 2021, 41(5): 1337-1342.
[8]	龙超奇, 蒋瑜, 谢雨. 基于峰值网格改进的小波聚类算法[J]. 计算机应用, 2021, 41(4): 1122-1127.
[9]	李杏峰, 黄玉清, 任珍文, 李毅红. 基于自适应邻域的鲁棒多视图聚类算法[J]. 计算机应用, 2021, 41(4): 1093-1099.
[10]	郭佳, 韩李涛, 孙宪龙, 周丽娟. 自动确定聚类中心的比较密度峰值聚类算法[J]. 计算机应用, 2021, 41(3): 738-744.
[11]	吕佳, 鲜焱. 结合改进密度峰值聚类和共享子空间的协同训练算法[J]. 计算机应用, 2021, 41(3): 686-693.
[12]	邹志文, 秦程. 基于k-means++的动态构建空间主题R树方法[J]. 计算机应用, 2021, 41(3): 733-737.
[13]	袁芊芊, 邓洪敏, 王晓航. 基于超像素快速模糊C均值聚类与支持向量机的柑橘病虫害区域分割[J]. 计算机应用, 2021, 41(2): 563-570.
[14]	张恩, 李会敏, 常键. 可验证的隐私保护k-means聚类方案[J]. 计算机应用, 2021, 41(2): 413-421.
[15]	陈港, 孟相如, 康巧燕, 阳勇. 基于拓扑分割与聚类分析的虚拟软件定义网络映射算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3309-3318.