K-means text clustering algorithm based on density and nearest neighbor

Journal of Computer Applications ›› 2010, Vol. 30 ›› Issue (07): 1933-1935.

• Database technology • Previous Articles Next Articles

K-means text clustering algorithm based on density and nearest neighbor

Received:2010-01-20 Revised:2010-03-08 Online:2010-07-01 Published:2010-07-01

基于密度和最近邻的Kk-means文本聚类算法

张文明¹,吴江¹,袁小蛟²

1. 西北大学信息科学与技术学院
2.

通讯作者: 张文明
基金资助:
西北大学科研启动基金;西北大学研究生自主创新基金项目

Abstract

Abstract: The initial focal point has a great influence on the clustering effects of traditional K-means algorithm, which makes cluster into a local optimal solution. In view of the existing problem，The algorithm that generates the initial cluster centers is proposed ,through introducing the density and nearest-neighbor idea, and these selected centers are used in K-means algorithm, getting the better text clustering algorithm called DN-K-means. The experiments results confirmed that the algorithm can produce clustering result with high and steady clustering quality.

Key words: text clustering, density, nearest neighbor, F-measure

摘要： 初始中心点的选择对于传统的K-means算法聚类效果影响较大，容易使聚类陷入局部最优解。针对这个问题，引入密度和最近邻思想，提出了生成初始聚类中心的算法，将所选聚类中心用于K-means算法，得到了更好的应用于文本聚类的DN-K-means算法。实验结果表明，该算法可以生成聚类质量较高并且稳定性较好的结果。

关键词: 文本聚类, 密度, 最近邻, F度量

张文明吴江袁小蛟. 基于密度和最近邻的Kk-means文本聚类算法[J]. 计算机应用, 2010, 30(07): 1933-1935.

[1]	WANG Yue, JIANG Yiming, LAN Julong. Intrusion detection based on improved triplet network and K-nearest neighbor algorithm [J]. Journal of Computer Applications, 2021, 41(7): 1996-2002.
[2]	ZHANG Hao, ZHU Rui, SONG Fuyao, FANG Peng, XIA Xiufeng. Bichromatic reverse k nearest neighbor query method based on distance-keyword similarity constraint [J]. Journal of Computer Applications, 2021, 41(6): 1686-1693.
[3]	CAI Ruiguang, ZHANG Desheng, XIAO Yanting. Parameter independent weighted local mean-based pseudo nearest neighbor classification algorithm [J]. Journal of Computer Applications, 2021, 41(6): 1694-1700.
[4]	GUO Yicun, CHEN Huahui. Survey on online hashing algorithm [J]. Journal of Computer Applications, 2021, 41(4): 1106-1112.
[5]	CAO Yang, YAN Qiuyan, WU Xin. Ensemble classification algorithm for imbalanced time series [J]. Journal of Computer Applications, 2021, 41(3): 651-656.
[6]	GUO Jia, HAN Litao, SUN Xianlong, ZHOU Lijuan. Comparative density peaks clustering algorithm with automatic determination of clustering center [J]. Journal of Computer Applications, 2021, 41(3): 738-744.
[7]	LYU Jia, XIAN Yan. Co-training algorithm combining improved density peak clustering and shared subspace [J]. Journal of Computer Applications, 2021, 41(3): 686-693.
[8]	FU Qianhui, LI Qingkui, FU Jingnan, WANG Yu. Dense crowd counting model based on spatial dimensional recurrent perception network [J]. Journal of Computer Applications, 2021, 41(2): 544-549.
[9]	Guangfu WU, Ziheng DAI. Cascaded quasi-cyclic moderate-density parity-check code based public key scheme for resisting reaction attack [J]. Journal of Computer Applications, 2021, 41(11): 3274-3280.
[10]	LI Mingwei, JIANG Qingyuan, XIE Yinpeng, HE Jindong, WU Dan. Hash learning based malicious SQL detection [J]. Journal of Computer Applications, 2021, 41(1): 121-126.
[11]	XIA Lunteng, ZHANG Li. Networked cane system for blind people based on K-nearest neighbor and dynamic time warping algorithms [J]. Journal of Computer Applications, 2020, 40(8): 2441-2448.
[12]	FAN Zhongxin. Clustering tendency analysis algorithm based on data stream [J]. Journal of Computer Applications, 2020, 40(8): 2248-2254.
[13]	JIN Bo, ZHANG Zhiyong, ZHAO Ting. Location nearest neighbor query method for social network based on differential privacy [J]. Journal of Computer Applications, 2020, 40(8): 2340-2344.
[14]	ZHONG Wenbin, SUN Si, LI Xurui, LIU Guangshuai. Point cloud compression method combining density threshold and triangle group approximation [J]. Journal of Computer Applications, 2020, 40(7): 2059-2068.
[15]	CHEN Fatang, ZHANG Youshou, DU Zheng. Low complexity offset min-sum algorithm for 5G low density parity check codes [J]. Journal of Computer Applications, 2020, 40(7): 2028-2032.

K-means text clustering algorithm based on density and nearest neighbor

基于密度和最近邻的Kk-means文本聚类算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics