计算机应用 ›› 2012, Vol. 32 ›› Issue (03): 638-642.DOI: 10.3724/SP.J.1087.2012.00638

• 人工智能 • 上一篇    下一篇

基于样本空间分布密度的改进次胜者受罚竞争学习算法

谢娟英1,2,郭文娟1,谢维信2,3,高新波2   

  1. 1.陕西师范大学 计算机科学学院,西安 710062;
    2.西安电子科技大学 电子工程学院,西安 710071;
    3.深圳大学 信息工程学院,广东 深圳 518060
  • 收稿日期:2011-09-01 修回日期:2011-11-24 发布日期:2012-03-01 出版日期:2012-03-01
  • 通讯作者: 谢娟英
  • 作者简介:谢娟英(1971-),女,陕西西安人,副教授,CCF会员,主要研究方向:智能信息处理、模式识别、机器学习、数据挖掘;郭文娟(1986-),女,甘肃武威人,硕士研究生,主要研究方向:智能信息处理、模式识别;谢维信(1941-),男,广东广州人,教授,博士生导师,主要研究方向:智能信息处理、目标识别、智能人机交互、图像处理、模式识别;高新波(1972-),男,山东莱芜人,教授,博士生导师,主要研究方向:机器学习、计算智能、视觉信息、无线通信。
  • 基金资助:

    中央高校基本科研业务费专项资金资助项目(GK200901006, GK201001003);陕西省自然科学基础研究计划项目(2010JM3004)。

Improvement rival penalized competitive learning algorithm based on pattern distribution of samples

XIE Juan-ying1,2, GUO Wen-juan1, XIE Wei-xin2,3, GAO Xin-bo2   

  1. 1.School of Computer Science, Shaanxi Normal University, Xi'an Shaanxi 710062, China;
    2.School of Electronic Engineering, Xidian University, Xi'an Shaanxi 710071, China; 3.School of Information Engineering, Shenzhen University, Shenzhen Guangdong 518060, China
  • Received:2011-09-01 Revised:2011-11-24 Online:2012-03-01 Published:2012-03-01
  • Contact: XIE Juan-ying

摘要: 针对传统次胜者受罚竞争学习(RPCL)算法忽略数据集几何结构对节点权值调整的影响,以及魏立梅等提出的新RPCL算法(魏立梅,谢维信.聚类分析中竞争学习的一种新算法.电子科学学刊,2000,22(1):13-18)引入密度来对节点的权值进行调整时,密度定义的主观性,提出基于样本空间分布密度的改进RPCL算法。该算法根据数据集样本自然分布定义样本密度,将此密度引入RPCL节点权值调整;使用UCI机器学习数据库数据集以及随机生成的带有噪声点的人工模拟数据集对算法进行实验测试,对算法确定数据集类簇数目的准确率、运行时间、聚类误差平方和、聚类结果的Rand指数、Jaccard系数以及Adjust Rand index参数进行分析比较。各项实验结果显示:所提算法优于原始RPCL算法和魏立梅算法,具有更好的聚类效果,对噪声数据有很强的抗干扰性能。所提算法不仅能根据样本的自然分布确定数据集的合理类簇数目,而且能确定合适的类簇中心,提高聚类的准确性,使聚类结果尽可能快地收敛到全局最优解。

关键词: 聚类, 次胜者受罚竞争学习算法, 样本密度, 聚类数目, 聚类中心

Abstract: The original Rival Penalized Competitive Learning (RPCL) algorithm ignores the influence of the geometry structure of a dataset on the weight variation of its nodes. A new RPCL algorithm proposed by Wei Limei et al. (WEI LIMEI, XIE WEIXIN. A new competitive learning algorithm for clustering analysis. Journal of Electronics, 2000, 22(1): 13-18) overcame the drawback of the original RPCL by introducing the density of samples to adjust the weights of nodes, while the density was not much objective. This paper defined a new density for a sample according to the pattern distribution of samples in a dataset, and introduced the density into the adjusting for the weights of nodes in RPCL to overcome the disadvantages of the available RPCL algorithms. The authors' improved RPCL algorithm was tested on some well-known datasets from UCI machine learning repository and on some synthetic data sets with noisy samples. The accuracy of determining the number of clusters of a dataset and the run time and the clustering error of the algorithms were compared. The Rand index, the Jaccard coefficient and the Adjust Rand index were used to analyze the performance of the algorithms. The experimental results show that the improved RPCL algorithm outperforms the original RPCL and the new RPCL proposed by WEI LIMEI et al. greatly, and achieves much better clustering results and has a stronger anti-interference performance for noisy data than that of the other two RPCL algorithms. All the analyses demonstrate that the improved RPCL algorithm can not only determine the right number of clusters for a dataset according to its sample distribution, but also uncover the suitable centers of clusters and advance the clustering accuracy as well as approximate the global optimal clustering result as fast as possible.

Key words: clustering, Rival Penalized Competitive Learning (RPCL) algorithm, sample density, cluster number, cluster center

中图分类号: