计算机应用 ›› 2010, Vol. 30 ›› Issue (8): 2003-2005.

• 人工智能 • 上一篇    下一篇

基于k-prototypes的混合属性数据聚类算法

陈韡1,王雷2,蒋子云3   

  1. 1. 湖南大学软件学院
    2. 湖南大学
    3. 中南大学 信息科学与工程学院 电子创新研究所
  • 收稿日期:2010-02-07 修回日期:2010-03-07 发布日期:2010-07-30 出版日期:2010-08-01
  • 通讯作者: 陈韡
  • 基金资助:
    国家高技术研究发展(863)计划

K-prototypes based clustering algorithm for data mixed with numeric and categorical values

  • Received:2010-02-07 Revised:2010-03-07 Online:2010-07-30 Published:2010-08-01

摘要: 通过对基于K-prototypes算法对混合属性数据处理的聚类问题进行研究,改进了K-prototypes算法中分类属性相异度计算公式,使之能更加精确反映样本间的差异;在此基础上提出了一种用于处理混合属性数据的聚类算法,并将改进后的算法应用于英语借词数据的聚类分析中。实验结果表明,与K-prototypes算法相比,改进后的算法具有更好的稳定性和更高的精度。

关键词: 聚类, k-prototypes算法, 混合属性数据, 相异度

Abstract: Based on the K-prototypes, the clustering problem for data mixed with numeric and categorical values was researched in this paper. At first, an improved formula for computing the dissimilarity degree was proposed, compared with the formula in the K-prototypes algorithm. The modified formula can reflect the samples similarities and differences more precisely. Furthermore, a new clustering algorithm for data mixed with numeric and categorical values was presented on the basis of the improved formula for computing the dissimilarity degree, which was finally applied in the clustering analysis of English loanwords. The experimental results show that the new algorithm has better stability and higher precision than the traditional K-prototypes algorithm.

Key words: clustering, k-prototypes algorithm, data with mixed numeric and categorical values, dissimilarity