Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2364-2369.DOI: 10.11772/j.issn.1001-9081.2022091356
• The 19th International Conference on Web Information Systems and Applications (WISA 2022) • Previous Articles Next Articles
					
						                                                                                                                                                                                                                                                    Shengwei MA1,2, Ruizhang HUANG1,2( ), Lina REN1,2, Chuan LIN1,2
), Lina REN1,2, Chuan LIN1,2
												  
						
						
						
					
				
Received:2022-09-12
															
							
																	Revised:2022-10-13
															
							
																	Accepted:2022-10-17
															
							
							
																	Online:2022-12-26
															
							
																	Published:2023-08-10
															
							
						Contact:
								Ruizhang HUANG   
													About author:MA Shengwei, born in 1999, M. S. candidate. Her research interests include natural language processing, deep clustering.Supported by:
        
                   
            马胜位1,2, 黄瑞章1,2( ), 任丽娜1,2, 林川1,2
), 任丽娜1,2, 林川1,2
                  
        
        
        
        
    
通讯作者:
					黄瑞章
							作者简介:马胜位(1999—),女,贵州紫云人,硕士研究生,CCF会员,主要研究方向:自然语言处理、深度聚类基金资助:CLC Number:
Shengwei MA, Ruizhang HUANG, Lina REN, Chuan LIN. Structured deep text clustering model based on multi-layer semantic fusion[J]. Journal of Computer Applications, 2023, 43(8): 2364-2369.
马胜位, 黄瑞章, 任丽娜, 林川. 基于多层语义融合的结构化深度文本聚类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2364-2369.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022091356
| 数据集 | 样本数 | 维度 | 类别 | 数据集 | 样本数 | 维度 | 类别 | 
|---|---|---|---|---|---|---|---|
| Citesser | 3 327 | 3 073 | 6 | Reuters | 10 000 | 2 000 | 4 | 
| Acm | 3 025 | 1 870 | 3 | Abstract | 4 306 | 10 000 | 3 | 
| Dblp | 4 058 | 334 | 4 | 
Tab. 1 Dataset details
| 数据集 | 样本数 | 维度 | 类别 | 数据集 | 样本数 | 维度 | 类别 | 
|---|---|---|---|---|---|---|---|
| Citesser | 3 327 | 3 073 | 6 | Reuters | 10 000 | 2 000 | 4 | 
| Acm | 3 025 | 1 870 | 3 | Abstract | 4 306 | 10 000 | 3 | 
| Dblp | 4 058 | 334 | 4 | 
| 模型 | Citeseer | Acm | Reuters | Dblp | Abstract | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | |
| k-means | 0.431 3 | 0.207 3 | 0.155 4 | 0.673 1 | 0.327 6 | 0.308 0 | 0.540 2 | 0.423 9 | 0.285 8 | 0.384 3 | 0.111 9 | 0.067 4 | 0.691 8 | 0.382 6 | 0.276 9 | 
| AE | 0.570 8 | 0.276 4 | 0.293 1 | 0.809 6 | 0.465 7 | 0.518 0 | 0.713 4 | 0.498 5 | 0.557 1 | 0.514 3 | 0.254 0 | 0.122 1 | 0.852 1 | 0.580 5 | 0.598 9 | 
| DEC | 0.523 3 | 0.282 1 | 0.242 1 | 0.847 3 | 0.591 8 | 0.609 8 | 0.731 2 | 0.502 6 | 0.548 6 | 0.581 6 | 0.295 1 | 0.239 2 | 0.868 7 | 0.603 6 | 0.641 2 | 
| GAE | 0.613 5 | 0.346 3 | 0.335 5 | 0.845 2 | 0.553 8 | 0.594 6 | 0.544 0 | 0.259 2 | 0.196 1 | 0.612 1 | 0.308 0 | 0.220 2 | 0.873 7 | 0.590 0 | 0.653 2 | 
| SDCN | 0.628 0 | 0.348 9 | 0.358 2 | 0.893 2 | 0.649 1 | 0.710 3 | 0.758 9 | 0.476 7 | 0.515 9 | 0.669 5 | 0.316 7 | 0.334 3 | 0.930 3 | 0.729 0 | 0.791 1 | 
| AGCN | 0.627 0 | 0.360 8 | 0.369 7 | 0.903 8 | 0.682 3 | 0.736 9 | 0.767 9 | 0.514 2 | 0.538 0 | 0.667 2 | 0.332 4 | 0.346 6 | 0.934 3 | 0.746 5 | 0.810 9 | 
| SDCMS | 0.663 7 | 0.396 6 | 0.399 8 | 0.918 3 | 0.722 8 | 0.774 5 | 0.785 1 | 0.528 4 | 0.561 2 | 0.669 7 | 0.348 6 | 0.319 9 | 0.943 8 | 0.781 4 | 0.836 2 | 
Tab. 2 Comparison of clustering results of different models
| 模型 | Citeseer | Acm | Reuters | Dblp | Abstract | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | |
| k-means | 0.431 3 | 0.207 3 | 0.155 4 | 0.673 1 | 0.327 6 | 0.308 0 | 0.540 2 | 0.423 9 | 0.285 8 | 0.384 3 | 0.111 9 | 0.067 4 | 0.691 8 | 0.382 6 | 0.276 9 | 
| AE | 0.570 8 | 0.276 4 | 0.293 1 | 0.809 6 | 0.465 7 | 0.518 0 | 0.713 4 | 0.498 5 | 0.557 1 | 0.514 3 | 0.254 0 | 0.122 1 | 0.852 1 | 0.580 5 | 0.598 9 | 
| DEC | 0.523 3 | 0.282 1 | 0.242 1 | 0.847 3 | 0.591 8 | 0.609 8 | 0.731 2 | 0.502 6 | 0.548 6 | 0.581 6 | 0.295 1 | 0.239 2 | 0.868 7 | 0.603 6 | 0.641 2 | 
| GAE | 0.613 5 | 0.346 3 | 0.335 5 | 0.845 2 | 0.553 8 | 0.594 6 | 0.544 0 | 0.259 2 | 0.196 1 | 0.612 1 | 0.308 0 | 0.220 2 | 0.873 7 | 0.590 0 | 0.653 2 | 
| SDCN | 0.628 0 | 0.348 9 | 0.358 2 | 0.893 2 | 0.649 1 | 0.710 3 | 0.758 9 | 0.476 7 | 0.515 9 | 0.669 5 | 0.316 7 | 0.334 3 | 0.930 3 | 0.729 0 | 0.791 1 | 
| AGCN | 0.627 0 | 0.360 8 | 0.369 7 | 0.903 8 | 0.682 3 | 0.736 9 | 0.767 9 | 0.514 2 | 0.538 0 | 0.667 2 | 0.332 4 | 0.346 6 | 0.934 3 | 0.746 5 | 0.810 9 | 
| SDCMS | 0.663 7 | 0.396 6 | 0.399 8 | 0.918 3 | 0.722 8 | 0.774 5 | 0.785 1 | 0.528 4 | 0.561 2 | 0.669 7 | 0.348 6 | 0.319 9 | 0.943 8 | 0.781 4 | 0.836 2 | 
| 数据集 | SDCMS | SDCMS-d | 数据集 | SDCMS | SDCMS-d | 
|---|---|---|---|---|---|
| Citeseer | 0.663 7 | 0.641 5 | Reuters | 0.785 1 | 0.724 5 | 
| Acm | 0.918 3 | 0.907 4 | Dblp | 0.669 7 | 0.638 9 | 
Tab. 3 Clustering accuracy of different structures
| 数据集 | SDCMS | SDCMS-d | 数据集 | SDCMS | SDCMS-d | 
|---|---|---|---|---|---|
| Citeseer | 0.663 7 | 0.641 5 | Reuters | 0.785 1 | 0.724 5 | 
| Acm | 0.918 3 | 0.907 4 | Dblp | 0.669 7 | 0.638 9 | 
| 数据集 | 层数 | ACC | NMI | ARI | 
|---|---|---|---|---|
| Citeseer | 0 | 0.618 0 | 0.348 9 | 0.358 2 | 
| 1 | 0.627 9 | 0.363 4 | 0.364 8 | |
| 4 | 0.6637 | 0.3966 | 0.3998 | |
| Acm | 0 | 0.893 2 | 0.649 1 | 0.710 3 | 
| 1 | 0.907 1 | 0.708 5 | 0.748 6 | |
| 4 | 0.9174 | 0.7182 | 0.7715 | |
| Reuters | 0 | 0.758 9 | 0.476 7 | 0.515 9 | 
| 1 | 0.754 6 | 0.500 0 | 0.488 7 | |
| 4 | 0.7851 | 0.5284 | 0.5612 | 
Tab. 4 Clustering accuracy of different GCN layer numbers
| 数据集 | 层数 | ACC | NMI | ARI | 
|---|---|---|---|---|
| Citeseer | 0 | 0.618 0 | 0.348 9 | 0.358 2 | 
| 1 | 0.627 9 | 0.363 4 | 0.364 8 | |
| 4 | 0.6637 | 0.3966 | 0.3998 | |
| Acm | 0 | 0.893 2 | 0.649 1 | 0.710 3 | 
| 1 | 0.907 1 | 0.708 5 | 0.748 6 | |
| 4 | 0.9174 | 0.7182 | 0.7715 | |
| Reuters | 0 | 0.758 9 | 0.476 7 | 0.515 9 | 
| 1 | 0.754 6 | 0.500 0 | 0.488 7 | |
| 4 | 0.7851 | 0.5284 | 0.5612 | 
| 1 | AGGARWAL C C, ZHAI C X. A survey of text classification algorithms[M]// Mining Text Data. Boston: Springer, 2012: 163-222. 10.1007/978-1-4614-3223-4_6 | 
| 2 | KIPFT N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22) [2022-09-25].. | 
| 3 | YANG B, FU X, SIDIROPOULOS N D, et al. Towards K-means-friendly spaces: simultaneous deep learning and clustering[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 3861-3870. | 
| 4 | HARTIGAN J A, WONG M A. Algorithm AS 136: a K-means clustering algorithm[J]. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979, 28(1): 100-108. 10.2307/2346830 | 
| 5 | XIE J Y, GIRSHICK R, FARHADI A. Unsupervised deep embedding for clustering analysis[C]// Proceedings of the 33rd International Conference on Machine Learning. New York: JMLR.org, 2016: 478-487. | 
| 6 | JIANG Z X, ZHENG Y, TAN H C, et al. Variational deep embedding: an unsupervised and generative approach to clustering[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 1965-1972. 10.24963/ijcai.2017/273 | 
| 7 | KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. (2022-12-10) [2023-02-26].. 10.1561/2200000056 | 
| 8 | BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[EB/OL]. (2014-05-21) [2022-09-25].. 10.1017/cbo9780511761942.003 | 
| 9 | KIPF T N, WELLING M. Variational graph auto-encoders[EB/OL]. (2016-11-21) [2022-09-26].. | 
| 10 | PAN S R, HU R Q, FUNG S F, et al. Learning graph embedding with adversarial training methods[J]. IEEE Transactions on Cybernetics, 2020, 50(6): 2475-2487. 10.1109/tcyb.2019.2932096 | 
| 11 | WANG C, PAN S R, LONG G D, et al. MGAE: marginalized graph autoencoder for graph clustering[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York: ACM, 2017:889-898. 10.1145/3132847.3132967 | 
| 12 | STRETCU O, VISWANATHAN K, MOVSGOVITZ-ATTIAS D, et al. Graph agreement models for semi-supervised learning[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2019: 8713-8723. | 
| 13 | WANG C, PAN S R, YU C P, et al. Deep neighbor-aware embedding for node clustering in attributed graphs[J]. Pattern Recognition, 2022, 122: No.108230. 10.1016/j.patcog.2021.108230 | 
| 14 | BO D Y, WANG X, SHI C, et al. Structural deep clustering network[C]// Proceedings of the Web Conference 2020. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2020: 1400-1410. 10.1145/3366423.3380214 | 
| 15 | PENG Z H, LIU H, JIA Y H, et al. Attention-driven graph clustering network[C]// Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 935-943. 10.1145/3474085.3475276 | 
| 16 | VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]// Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008:1096-1103. 10.1145/1390156.1390294 | 
| 17 | MASCI J, MEIER U, CIREŞAN D, et al. Stacked convolutional auto-encoders for hierarchical feature extraction[C]// Proceedings of the 2011 International Conference on Artificial Neural Networks, LNCS 6791. Berlin: Springer, 2011: 52-59. | 
| 18 | MALHOTRA P, VISHNU T V, RAMAKRISHNAN A, et al. Multi-sensor prognostics using an unsupervised health index based on LSTM encoder-decoder[C/OL]// Proceedings of the 1st ACM SIGKDD Workshop on Machine Learning for Prognostics and Health Management ( 2016-08-22) [2022-09-26].. | 
| 19 | MAKHZANI A, SHLENS J, JAITLY N, et al. Adversarial autoencoders[EB/OL]. (2016-05-25) [2022-09-26].. | 
| 20 | HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507. 10.1126/science.1127647 | 
| 21 | NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]// Proceedings of the 27th International Conference on Machine Learning. Madison, WI: Omnipress, 2010:807-814. | 
| 22 | L van der MAATEN, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605. | 
| 23 | WANG X, JI H Y, SHI C, et al. Heterogeneous graph attention network[C]// Proceedings of the World Wide Web Conference 2019. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2019: 2022-2032. 10.1145/3308558.3313562 | 
| 24 | LEWIS D D, YANG Y M, ROSE T G, et al. RCV1: a new benchmark collection for text categorization research[J]. Journal of Machine Learning Research, 2004, 5: 361-397. | 
| 25 | 黄瑞章,白瑞娜,陈艳平,等. CMDC:一种差异互补的迭代式多维度文本聚类算法[J]. 通信学报, 2020, 41(8): 155-164. 10.11959/j.issn.1000-436x.2020152 | 
| HUANG R Z, BAI R N, CHEN Y P, et al. CMDC: an iterative algorithm for complementary multi-view document clustering[J]. Journal on Communications, 2020, 41(8): 155-164. 10.11959/j.issn.1000-436x.2020152 | |
| 26 | KRASKOV A, STÖGBAUER H, GRASSBERGER P. Estimating mutual information[J]. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 2004, 69(6): No.066138. 10.1103/physreve.69.066138 | 
| [1] | Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710. | 
| [2] | Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731. | 
| [3] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. | 
| [4] | Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718. | 
| [5] | Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642. | 
| [6] | Xinrui LIN, Xiaofei WANG, Yan ZHU. Academic anomaly citation group detection based on local extended community detection [J]. Journal of Computer Applications, 2024, 44(6): 1855-1861. | 
| [7] | Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL: positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492. | 
| [8] | Guijin HAN, Xinyuan ZHANG, Wentao ZHANG, Ya HUANG. Self-supervised image registration algorithm based on multi-feature fusion [J]. Journal of Computer Applications, 2024, 44(5): 1597-1604. | 
| [9] | Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN. Recommendation method based on knowledge‑awareness and cross-level contrastive learning [J]. Journal of Computer Applications, 2024, 44(4): 1121-1127. | 
| [10] | Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276. | 
| [11] | Dapeng XU, Xinmin HOU. Feature selection method for graph neural network based on network architecture design [J]. Journal of Computer Applications, 2024, 44(3): 663-670. | 
| [12] | Nengbing HU, Biao CAI, Xu LI, Danhua CAO. Graph classification method based on graph pooling contrast learning [J]. Journal of Computer Applications, 2024, 44(11): 3327-3334. | 
| [13] | Beijing ZHOU, Hairong WANG, Yimeng WANG, Lisi ZHANG, He MA. Recommendation method using knowledge graph embedding propagation [J]. Journal of Computer Applications, 2024, 44(10): 3252-3259. | 
| [14] | Yuning ZHANG, Abudukelimu ABULIZI, Tisheng MEI, Chun XU, Maierdana MAIMAITIREYIMU, Halidanmu ABUDUKELIMU, Yutao HOU. Anomaly detection method for skeletal X-ray images based on self-supervised feature extraction [J]. Journal of Computer Applications, 2024, 44(1): 175-181. | 
| [15] | Hongbin WANG, Xiao FANG, Hong JIANG. Commonsense reasoning and question answering method with three-dimensional semantic features [J]. Journal of Computer Applications, 2024, 44(1): 138-144. | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||