计算机应用 ›› 2010, Vol. 30 ›› Issue (07): 1933-1935.

• 数据库技术 • 上一篇    下一篇

基于密度和最近邻的Kk-means文本聚类算法

张文明1,吴江1,袁小蛟2   

  1. 1. 西北大学信息科学与技术学院
    2.
  • 收稿日期:2010-01-20 修回日期:2010-03-08 发布日期:2010-07-01 出版日期:2010-07-01
  • 通讯作者: 张文明
  • 基金资助:
    西北大学科研启动基金;西北大学研究生自主创新基金项目

K-means text clustering algorithm based on density and nearest neighbor

  • Received:2010-01-20 Revised:2010-03-08 Online:2010-07-01 Published:2010-07-01

摘要: 初始中心点的选择对于传统的K-means算法聚类效果影响较大,容易使聚类陷入局部最优解。针对这个问题,引入密度和最近邻思想,提出了生成初始聚类中心的算法,将所选聚类中心用于K-means算法,得到了更好的应用于文本聚类的DN-K-means算法。实验结果表明,该算法可以生成聚类质量较高并且稳定性较好的结果。

关键词: 文本聚类, 密度, 最近邻, F度量

Abstract: The initial focal point has a great influence on the clustering effects of traditional K-means algorithm, which makes cluster into a local optimal solution. In view of the existing problem,The algorithm that generates the initial cluster centers is proposed ,through introducing the density and nearest-neighbor idea, and these selected centers are used in K-means algorithm, getting the better text clustering algorithm called DN-K-means. The experiments results confirmed that the algorithm can produce clustering result with high and steady clustering quality.

Key words: text clustering, density, nearest neighbor, F-measure