Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2364-2369.DOI: 10.11772/j.issn.1001-9081.2022091356
• The 19th International Conference on Web Information Systems and Applications (WISA 2022) • Previous Articles Next Articles
Shengwei MA1,2, Ruizhang HUANG1,2(), Lina REN1,2, Chuan LIN1,2
Received:
2022-09-12
Revised:
2022-10-13
Accepted:
2022-10-17
Online:
2022-12-26
Published:
2023-08-10
Contact:
Ruizhang HUANG
About author:
MA Shengwei, born in 1999, M. S. candidate. Her research interests include natural language processing, deep clustering.Supported by:
马胜位1,2, 黄瑞章1,2(), 任丽娜1,2, 林川1,2
通讯作者:
黄瑞章
作者简介:
马胜位(1999—),女,贵州紫云人,硕士研究生,CCF会员,主要研究方向:自然语言处理、深度聚类基金资助:
CLC Number:
Shengwei MA, Ruizhang HUANG, Lina REN, Chuan LIN. Structured deep text clustering model based on multi-layer semantic fusion[J]. Journal of Computer Applications, 2023, 43(8): 2364-2369.
马胜位, 黄瑞章, 任丽娜, 林川. 基于多层语义融合的结构化深度文本聚类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2364-2369.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022091356
数据集 | 样本数 | 维度 | 类别 | 数据集 | 样本数 | 维度 | 类别 |
---|---|---|---|---|---|---|---|
Citesser | 3 327 | 3 073 | 6 | Reuters | 10 000 | 2 000 | 4 |
Acm | 3 025 | 1 870 | 3 | Abstract | 4 306 | 10 000 | 3 |
Dblp | 4 058 | 334 | 4 |
Tab. 1 Dataset details
数据集 | 样本数 | 维度 | 类别 | 数据集 | 样本数 | 维度 | 类别 |
---|---|---|---|---|---|---|---|
Citesser | 3 327 | 3 073 | 6 | Reuters | 10 000 | 2 000 | 4 |
Acm | 3 025 | 1 870 | 3 | Abstract | 4 306 | 10 000 | 3 |
Dblp | 4 058 | 334 | 4 |
模型 | Citeseer | Acm | Reuters | Dblp | Abstract | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | |
k-means | 0.431 3 | 0.207 3 | 0.155 4 | 0.673 1 | 0.327 6 | 0.308 0 | 0.540 2 | 0.423 9 | 0.285 8 | 0.384 3 | 0.111 9 | 0.067 4 | 0.691 8 | 0.382 6 | 0.276 9 |
AE | 0.570 8 | 0.276 4 | 0.293 1 | 0.809 6 | 0.465 7 | 0.518 0 | 0.713 4 | 0.498 5 | 0.557 1 | 0.514 3 | 0.254 0 | 0.122 1 | 0.852 1 | 0.580 5 | 0.598 9 |
DEC | 0.523 3 | 0.282 1 | 0.242 1 | 0.847 3 | 0.591 8 | 0.609 8 | 0.731 2 | 0.502 6 | 0.548 6 | 0.581 6 | 0.295 1 | 0.239 2 | 0.868 7 | 0.603 6 | 0.641 2 |
GAE | 0.613 5 | 0.346 3 | 0.335 5 | 0.845 2 | 0.553 8 | 0.594 6 | 0.544 0 | 0.259 2 | 0.196 1 | 0.612 1 | 0.308 0 | 0.220 2 | 0.873 7 | 0.590 0 | 0.653 2 |
SDCN | 0.628 0 | 0.348 9 | 0.358 2 | 0.893 2 | 0.649 1 | 0.710 3 | 0.758 9 | 0.476 7 | 0.515 9 | 0.669 5 | 0.316 7 | 0.334 3 | 0.930 3 | 0.729 0 | 0.791 1 |
AGCN | 0.627 0 | 0.360 8 | 0.369 7 | 0.903 8 | 0.682 3 | 0.736 9 | 0.767 9 | 0.514 2 | 0.538 0 | 0.667 2 | 0.332 4 | 0.346 6 | 0.934 3 | 0.746 5 | 0.810 9 |
SDCMS | 0.663 7 | 0.396 6 | 0.399 8 | 0.918 3 | 0.722 8 | 0.774 5 | 0.785 1 | 0.528 4 | 0.561 2 | 0.669 7 | 0.348 6 | 0.319 9 | 0.943 8 | 0.781 4 | 0.836 2 |
Tab. 2 Comparison of clustering results of different models
模型 | Citeseer | Acm | Reuters | Dblp | Abstract | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | |
k-means | 0.431 3 | 0.207 3 | 0.155 4 | 0.673 1 | 0.327 6 | 0.308 0 | 0.540 2 | 0.423 9 | 0.285 8 | 0.384 3 | 0.111 9 | 0.067 4 | 0.691 8 | 0.382 6 | 0.276 9 |
AE | 0.570 8 | 0.276 4 | 0.293 1 | 0.809 6 | 0.465 7 | 0.518 0 | 0.713 4 | 0.498 5 | 0.557 1 | 0.514 3 | 0.254 0 | 0.122 1 | 0.852 1 | 0.580 5 | 0.598 9 |
DEC | 0.523 3 | 0.282 1 | 0.242 1 | 0.847 3 | 0.591 8 | 0.609 8 | 0.731 2 | 0.502 6 | 0.548 6 | 0.581 6 | 0.295 1 | 0.239 2 | 0.868 7 | 0.603 6 | 0.641 2 |
GAE | 0.613 5 | 0.346 3 | 0.335 5 | 0.845 2 | 0.553 8 | 0.594 6 | 0.544 0 | 0.259 2 | 0.196 1 | 0.612 1 | 0.308 0 | 0.220 2 | 0.873 7 | 0.590 0 | 0.653 2 |
SDCN | 0.628 0 | 0.348 9 | 0.358 2 | 0.893 2 | 0.649 1 | 0.710 3 | 0.758 9 | 0.476 7 | 0.515 9 | 0.669 5 | 0.316 7 | 0.334 3 | 0.930 3 | 0.729 0 | 0.791 1 |
AGCN | 0.627 0 | 0.360 8 | 0.369 7 | 0.903 8 | 0.682 3 | 0.736 9 | 0.767 9 | 0.514 2 | 0.538 0 | 0.667 2 | 0.332 4 | 0.346 6 | 0.934 3 | 0.746 5 | 0.810 9 |
SDCMS | 0.663 7 | 0.396 6 | 0.399 8 | 0.918 3 | 0.722 8 | 0.774 5 | 0.785 1 | 0.528 4 | 0.561 2 | 0.669 7 | 0.348 6 | 0.319 9 | 0.943 8 | 0.781 4 | 0.836 2 |
数据集 | SDCMS | SDCMS-d | 数据集 | SDCMS | SDCMS-d |
---|---|---|---|---|---|
Citeseer | 0.663 7 | 0.641 5 | Reuters | 0.785 1 | 0.724 5 |
Acm | 0.918 3 | 0.907 4 | Dblp | 0.669 7 | 0.638 9 |
Tab. 3 Clustering accuracy of different structures
数据集 | SDCMS | SDCMS-d | 数据集 | SDCMS | SDCMS-d |
---|---|---|---|---|---|
Citeseer | 0.663 7 | 0.641 5 | Reuters | 0.785 1 | 0.724 5 |
Acm | 0.918 3 | 0.907 4 | Dblp | 0.669 7 | 0.638 9 |
数据集 | 层数 | ACC | NMI | ARI |
---|---|---|---|---|
Citeseer | 0 | 0.618 0 | 0.348 9 | 0.358 2 |
1 | 0.627 9 | 0.363 4 | 0.364 8 | |
4 | 0.6637 | 0.3966 | 0.3998 | |
Acm | 0 | 0.893 2 | 0.649 1 | 0.710 3 |
1 | 0.907 1 | 0.708 5 | 0.748 6 | |
4 | 0.9174 | 0.7182 | 0.7715 | |
Reuters | 0 | 0.758 9 | 0.476 7 | 0.515 9 |
1 | 0.754 6 | 0.500 0 | 0.488 7 | |
4 | 0.7851 | 0.5284 | 0.5612 |
Tab. 4 Clustering accuracy of different GCN layer numbers
数据集 | 层数 | ACC | NMI | ARI |
---|---|---|---|---|
Citeseer | 0 | 0.618 0 | 0.348 9 | 0.358 2 |
1 | 0.627 9 | 0.363 4 | 0.364 8 | |
4 | 0.6637 | 0.3966 | 0.3998 | |
Acm | 0 | 0.893 2 | 0.649 1 | 0.710 3 |
1 | 0.907 1 | 0.708 5 | 0.748 6 | |
4 | 0.9174 | 0.7182 | 0.7715 | |
Reuters | 0 | 0.758 9 | 0.476 7 | 0.515 9 |
1 | 0.754 6 | 0.500 0 | 0.488 7 | |
4 | 0.7851 | 0.5284 | 0.5612 |
1 | AGGARWAL C C, ZHAI C X. A survey of text classification algorithms[M]// Mining Text Data. Boston: Springer, 2012: 163-222. 10.1007/978-1-4614-3223-4_6 |
2 | KIPFT N, WELLING M. Semi-supervised classification with graph convolutional networks[EB/OL]. (2017-02-22) [2022-09-25].. |
3 | YANG B, FU X, SIDIROPOULOS N D, et al. Towards K-means-friendly spaces: simultaneous deep learning and clustering[C]// Proceedings of the 34th International Conference on Machine Learning. New York: JMLR.org, 2017: 3861-3870. |
4 | HARTIGAN J A, WONG M A. Algorithm AS 136: a K-means clustering algorithm[J]. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979, 28(1): 100-108. 10.2307/2346830 |
5 | XIE J Y, GIRSHICK R, FARHADI A. Unsupervised deep embedding for clustering analysis[C]// Proceedings of the 33rd International Conference on Machine Learning. New York: JMLR.org, 2016: 478-487. |
6 | JIANG Z X, ZHENG Y, TAN H C, et al. Variational deep embedding: an unsupervised and generative approach to clustering[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 1965-1972. 10.24963/ijcai.2017/273 |
7 | KINGMA D P, WELLING M. Auto-encoding variational Bayes[EB/OL]. (2022-12-10) [2023-02-26].. 10.1561/2200000056 |
8 | BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[EB/OL]. (2014-05-21) [2022-09-25].. 10.1017/cbo9780511761942.003 |
9 | KIPF T N, WELLING M. Variational graph auto-encoders[EB/OL]. (2016-11-21) [2022-09-26].. |
10 | PAN S R, HU R Q, FUNG S F, et al. Learning graph embedding with adversarial training methods[J]. IEEE Transactions on Cybernetics, 2020, 50(6): 2475-2487. 10.1109/tcyb.2019.2932096 |
11 | WANG C, PAN S R, LONG G D, et al. MGAE: marginalized graph autoencoder for graph clustering[C]// Proceedings of the 2017 ACM Conference on Information and Knowledge Management. New York: ACM, 2017:889-898. 10.1145/3132847.3132967 |
12 | STRETCU O, VISWANATHAN K, MOVSGOVITZ-ATTIAS D, et al. Graph agreement models for semi-supervised learning[C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2019: 8713-8723. |
13 | WANG C, PAN S R, YU C P, et al. Deep neighbor-aware embedding for node clustering in attributed graphs[J]. Pattern Recognition, 2022, 122: No.108230. 10.1016/j.patcog.2021.108230 |
14 | BO D Y, WANG X, SHI C, et al. Structural deep clustering network[C]// Proceedings of the Web Conference 2020. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2020: 1400-1410. 10.1145/3366423.3380214 |
15 | PENG Z H, LIU H, JIA Y H, et al. Attention-driven graph clustering network[C]// Proceedings of the 29th ACM International Conference on Multimedia. New York: ACM, 2021: 935-943. 10.1145/3474085.3475276 |
16 | VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]// Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008:1096-1103. 10.1145/1390156.1390294 |
17 | MASCI J, MEIER U, CIREŞAN D, et al. Stacked convolutional auto-encoders for hierarchical feature extraction[C]// Proceedings of the 2011 International Conference on Artificial Neural Networks, LNCS 6791. Berlin: Springer, 2011: 52-59. |
18 | MALHOTRA P, VISHNU T V, RAMAKRISHNAN A, et al. Multi-sensor prognostics using an unsupervised health index based on LSTM encoder-decoder[C/OL]// Proceedings of the 1st ACM SIGKDD Workshop on Machine Learning for Prognostics and Health Management ( 2016-08-22) [2022-09-26].. |
19 | MAKHZANI A, SHLENS J, JAITLY N, et al. Adversarial autoencoders[EB/OL]. (2016-05-25) [2022-09-26].. |
20 | HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507. 10.1126/science.1127647 |
21 | NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]// Proceedings of the 27th International Conference on Machine Learning. Madison, WI: Omnipress, 2010:807-814. |
22 | L van der MAATEN, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605. |
23 | WANG X, JI H Y, SHI C, et al. Heterogeneous graph attention network[C]// Proceedings of the World Wide Web Conference 2019. Republic and Canton of Geneva: International World Wide Web Conferences Steering Committee, 2019: 2022-2032. 10.1145/3308558.3313562 |
24 | LEWIS D D, YANG Y M, ROSE T G, et al. RCV1: a new benchmark collection for text categorization research[J]. Journal of Machine Learning Research, 2004, 5: 361-397. |
25 | 黄瑞章,白瑞娜,陈艳平,等. CMDC:一种差异互补的迭代式多维度文本聚类算法[J]. 通信学报, 2020, 41(8): 155-164. 10.11959/j.issn.1000-436x.2020152 |
HUANG R Z, BAI R N, CHEN Y P, et al. CMDC: an iterative algorithm for complementary multi-view document clustering[J]. Journal on Communications, 2020, 41(8): 155-164. 10.11959/j.issn.1000-436x.2020152 | |
26 | KRASKOV A, STÖGBAUER H, GRASSBERGER P. Estimating mutual information[J]. Physical Review. E, Statistical, Nonlinear, and Soft Matter Physics, 2004, 69(6): No.066138. 10.1103/physreve.69.066138 |
[1] | Xingyao YANG, Yu CHEN, Jiong YU, Zulian ZHANG, Jiaying CHEN, Dongxiao WANG. Recommendation model combining self-features and contrastive learning [J]. Journal of Computer Applications, 2024, 44(9): 2704-2710. |
[2] | Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731. |
[3] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[4] | Tingjie TANG, Jiajin HUANG, Jin QIN. Session-based recommendation with graph auxiliary learning [J]. Journal of Computer Applications, 2024, 44(9): 2711-2718. |
[5] | Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642. |
[6] | Xinrui LIN, Xiaofei WANG, Yan ZHU. Academic anomaly citation group detection based on local extended community detection [J]. Journal of Computer Applications, 2024, 44(6): 1855-1861. |
[7] | Jiong WANG, Taotao TANG, Caiyan JIA. PAGCL: positive augmentation graph contrastive learning recommendation method without negative sampling [J]. Journal of Computer Applications, 2024, 44(5): 1485-1492. |
[8] | Guijin HAN, Xinyuan ZHANG, Wentao ZHANG, Ya HUANG. Self-supervised image registration algorithm based on multi-feature fusion [J]. Journal of Computer Applications, 2024, 44(5): 1597-1604. |
[9] | Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN. Recommendation method based on knowledge‑awareness and cross-level contrastive learning [J]. Journal of Computer Applications, 2024, 44(4): 1121-1127. |
[10] | Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276. |
[11] | Dapeng XU, Xinmin HOU. Feature selection method for graph neural network based on network architecture design [J]. Journal of Computer Applications, 2024, 44(3): 663-670. |
[12] | Nengbing HU, Biao CAI, Xu LI, Danhua CAO. Graph classification method based on graph pooling contrast learning [J]. Journal of Computer Applications, 2024, 44(11): 3327-3334. |
[13] | Beijing ZHOU, Hairong WANG, Yimeng WANG, Lisi ZHANG, He MA. Recommendation method using knowledge graph embedding propagation [J]. Journal of Computer Applications, 2024, 44(10): 3252-3259. |
[14] | Yuning ZHANG, Abudukelimu ABULIZI, Tisheng MEI, Chun XU, Maierdana MAIMAITIREYIMU, Halidanmu ABUDUKELIMU, Yutao HOU. Anomaly detection method for skeletal X-ray images based on self-supervised feature extraction [J]. Journal of Computer Applications, 2024, 44(1): 175-181. |
[15] | Hongbin WANG, Xiao FANG, Hong JIANG. Commonsense reasoning and question answering method with three-dimensional semantic features [J]. Journal of Computer Applications, 2024, 44(1): 138-144. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||