计算机应用 ›› 2013, Vol. 33 ›› Issue (09): 1001-9081.DOI: 10.11772/j.issn.1001-9081.2013.09.2482

• 数据库技术 •    下一篇

基于改进流形距离K-medoids算法

邱兴兴,程霄   

  1. 九江学院 信息科学与技术学院, 江西 九江 332005
  • 收稿日期:2013-04-01 修回日期:2013-04-27 出版日期:2013-09-01 发布日期:2013-10-18
  • 通讯作者: 邱兴兴
  • 作者简介:邱兴兴(1979-),男,江西九江人,讲师,硕士,CCF会员,主要研究方向:数据挖掘、进化计算;
    程霄(1978-),男,江西九江人,讲师,硕士,主要研究方向: 程序设计方法学、数据挖掘、进化计算。

K-medoids algorithm based on improved manifold distance

QIU Xingxing,CHENG Xiao   

  1. School of Information Science and Technology, Jiujiang University, Jiujiang Jiangxi 332005, China
  • Received:2013-04-01 Revised:2013-04-27 Online:2013-10-18 Published:2013-09-01
  • Contact: QIU Xingxing

摘要: 针对空间分布复杂的数据以及空间分布未知的现实数据聚类问题,设计了一种改进流形距离作为不相似测度。该不相似测度可有效利用所有数据点之间的全局一致性,挖掘无类属数据集的空间分布信息。通过使用该不相似测度,提出了基于改进流形距离K-medoids算法。将新算法与基于已有的流形距离和基于欧氏距离的K-medoids算法进行性能比较,对八个人工数据集以及USPS手写体数字识别问题的实验结果表明:新算法针对不同结构的测试数据集,在聚类性能上均优于或接近于另外两种K-medoids算法,并且对于各种分布的,无论简单或复杂,凸或者非凸的数据都可以进行聚类。

关键词: 不相似测度, K-medoids算法, 聚类, 流形距离, 模式识别

Abstract: In this paper, an improved manifold distance based dissimilarity measure was designed to identify clusters in complex distribution and unknown reality data sets. This dissimilarity measure can mine the space distribution information of the data sets with no class labels by utilizing the global consistency between all data points. A K-medoids algorithm based on the improved manifold distance was proposed using the dissimilarity measure. The experimental results on eight artificial data sets with different structure and the USPS handwritten digit data sets indicate that the new algorithm outperforms or performs similarly to the other two K-medoids algorithms based on the existing manifold distance and Euclid distance and has the ability to identify clusters with simple or complex, convex or non-convex distribution.

Key words: dissimilarity measure, K-medoids algorithm, clustering, manifold distance, pattern recognition

中图分类号: