Journal of Computer Applications ›› 2010, Vol. 30 ›› Issue (07): 1933-1935.
• Database technology • Previous Articles Next Articles
Received:
Revised:
Online:
Published:
张文明1,吴江1,袁小蛟2
通讯作者:
基金资助:
Abstract: The initial focal point has a great influence on the clustering effects of traditional K-means algorithm, which makes cluster into a local optimal solution. In view of the existing problem,The algorithm that generates the initial cluster centers is proposed ,through introducing the density and nearest-neighbor idea, and these selected centers are used in K-means algorithm, getting the better text clustering algorithm called DN-K-means. The experiments results confirmed that the algorithm can produce clustering result with high and steady clustering quality.
Key words: text clustering, density, nearest neighbor, F-measure
摘要: 初始中心点的选择对于传统的K-means算法聚类效果影响较大,容易使聚类陷入局部最优解。针对这个问题,引入密度和最近邻思想,提出了生成初始聚类中心的算法,将所选聚类中心用于K-means算法,得到了更好的应用于文本聚类的DN-K-means算法。实验结果表明,该算法可以生成聚类质量较高并且稳定性较好的结果。
关键词: 文本聚类, 密度, 最近邻, F度量
张文明 吴江 袁小蛟. 基于密度和最近邻的Kk-means文本聚类算法[J]. 计算机应用, 2010, 30(07): 1933-1935.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/
http://www.joca.cn/EN/Y2010/V30/I07/1933