计算机应用 ›› 2010, Vol. 30 ›› Issue (10): 2614-2617.

• 数据库与数据挖掘 • 上一篇    下一篇

半监督的自动聚类

潘章明   

  1. 广东金融学院
  • 收稿日期:2010-04-09 修回日期:2010-06-09 发布日期:2010-09-21 出版日期:2010-10-01
  • 通讯作者: 潘章明

Semi-supervised automatic clustering

PAN Zhang-Ming   

  • Received:2010-04-09 Revised:2010-06-09 Online:2010-09-21 Published:2010-10-01
  • Contact: PAN Zhang-Ming

摘要: 基于进化算法的自动聚类方法在处理聚类结构比较松散的数据集时,存在聚类准确性不高、收敛速度慢的缺陷,为此提出一种半监督的自动聚类算法。该算法从调整染色体的解码过程入手,首先从染色体中分离出聚类数和所有的质心,然后使用最近邻规则滤去部分偏离数据集分布区域的无效质心,最后嵌入先验信息辅助K-均值方法对剩余的质心聚类,进一步优化染色体的解码结果。实验结果表明,该算法对聚类结构紧密或松散的数据集均可给出较精确的聚类结果。

关键词: 半监督聚类, 自动聚类, 差分进化, 全局优化, K-均值

Abstract: The evolutionary algorithm based automatic clustering methods are lack of accuracy and slow in converging while dealing with non-compact clusters. A semi-supervised automatic clustering algorithm was proposed to solve this problem. The method started with the decoding of chromosomes. First was to separate the cluster number and all of the centroids from chromosome, then to filter the centroids of no effects using nearest neighbor algorithm. After incorporating the prior information of the data set, the decoding results could be further improved using K-means method to cluster the rest centroids. The experimental results verify the effectiveness of the proposed method for data sets with both compact and non-compact cluster structures.

Key words: semi-supervised clustering, automatic clustering, differential evolution, global optimization, K-means

中图分类号: