计算机应用

• 数据库技术(Database technology) • 上一篇    下一篇

适用于区间数据的基于相互距离的相似性传播聚类

谢信喜 王士同   

  1. 江南大学 信息工程学院 江南大学 信息工程学院
  • 收稿日期:2007-12-06 修回日期:2008-01-15 发布日期:2008-06-01 出版日期:2008-06-01
  • 通讯作者: 谢信喜

Affinity propagation clustering for symbolic interval data based on mutual distances

Xin-xi XIE Shi-tong WANG   

  • Received:2007-12-06 Revised:2008-01-15 Online:2008-06-01 Published:2008-06-01
  • Contact: Xin-xi XIE

摘要: 符号聚类是对传统聚类的重要扩展,而区间数据是一类常见的符号数据。传统聚类中使用的对称性度量不一定适用于度量区间数据,且算法初始化也一直是干扰聚类的严重问题。因此,提出了一种适用于区间数据的度量--相互距离,并在此度量的基础上采用了一种全新的聚类方法--相似性传播聚类,解决了初始化干扰问题,从而得出了适用于区间数据的基于相互距离的相似性传播聚类。通过理论阐述和实验比较,说明了该算法比基于欧氏聚类的K-均值算法要好。

关键词: 符号聚类, 区间数据, 相互距离, 相似性传播, K-均值

Abstract: Clustering for symbolic data is an important extension of conventional clustering, and interval representation for symbolic data is often used. The symmetrical measures in conventional clustering algorithms are sometimes not fit to interval data and the initialization is another severe problem that can affect the clustering algorithms. One metric called mutual distances for interval data was proposed; based on the metric, a new clustering method named affinity propagation clustering that could solve the problem initialization was used. Then, affinity propagation clustering for symbolic interval data based on mutual distance was given. Theoretical explanation and experiments indicate that the proposed algorithm outperforms K-means based on Euclidean distances for the interval symbolic data.

Key words: clustering of symbol, interval data, mutual distance, affinity propagation, K-means