Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (7): 2145-2152.DOI: 10.11772/j.issn.1001-9081.2024070931
• The 39th CCF National Conference of Computer Applications (CCF NCCA 2024) • Previous Articles Next Articles
Le XU1,2,3, Ruizhang HUANG1,2,3(), Ruina BAI1,2,3, Yongbin QIN1,2,3
Received:
2024-07-01
Revised:
2024-09-25
Accepted:
2024-10-09
Online:
2025-07-10
Published:
2025-07-10
Contact:
Ruizhang HUANG
About author:
XU Le, born in 1999, M. S. candidate. Her research interests include natural language processing, text mining, machine learning.Supported by:
徐乐1,2,3, 黄瑞章1,2,3(), 白瑞娜1,2,3, 秦永彬1,2,3
通讯作者:
黄瑞章
作者简介:
徐乐(1999—),女,四川泸州人,硕士研究生,CCF会员,主要研究方向:自然语言处理、文本挖掘、机器学习基金资助:
CLC Number:
Le XU, Ruizhang HUANG, Ruina BAI, Yongbin QIN. Deep semi-supervised text clustering with intentional regularization[J]. Journal of Computer Applications, 2025, 45(7): 2145-2152.
徐乐, 黄瑞章, 白瑞娜, 秦永彬. 基于意图正则化的深度半监督文本聚类[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2145-2152.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2024070931
名称 | 类数 | 文本数 | 维数 |
---|---|---|---|
Reu-10k | 4 | 10 000 | 2 000 |
BBC | 5 | 2 225 | 9 635 |
ACM | 3 | 3 025 | 1 870 |
Abstract | 3 | 4 306 | 10 000 |
Tab. 1 Information of individual datasets
名称 | 类数 | 文本数 | 维数 |
---|---|---|---|
Reu-10k | 4 | 10 000 | 2 000 |
BBC | 5 | 2 225 | 9 635 |
ACM | 3 | 3 025 | 1 870 |
Abstract | 3 | 4 306 | 10 000 |
数据集 | 评价 | K-means | AE | DEC | IDEC | Cop-Kmeans | SDEC | CPAC | S4NMF | IRDSTC |
---|---|---|---|---|---|---|---|---|---|---|
Reu-10k | ACC | 54.04 | 71.16 | 72.75 | 75.03 | 65.84 | 72.80 | 72.79 | 61.99 | 81.27 |
NMI | 41.28 | 48.41 | 51.62 | 51.98 | 42.53 | 48.88 | 48.32 | 41.26 | 66.67 | |
ARI | 27.15 | 54.82 | 57.97 | 58.26 | 50.49 | 53.91 | 54.61 | 28.92 | 62.87 | |
BBC | ACC | 51.58 | 57.03 | 60.45 | 76.40 | 69.16 | 64.80 | 67.01 | 78.83 | 94.79 |
NMI | 30.88 | 53.88 | 57.19 | 64.48 | 53.65 | 51.02 | 51.01 | 56.41 | 85.49 | |
ARI | 20.50 | 41.13 | 47.15 | 69.10 | 49.78 | 44.60 | 41.08 | 53.29 | 87.93 | |
ACM | ACC | 67.37 | 83.83 | 84.74 | 85.13 | 75.02 | 85.53 | 82.09 | 66.94 | 92.30 |
NMI | 33.54 | 51.83 | 54.85 | 56.16 | 48.71 | 55.37 | 56.42 | 27.83 | 71.40 | |
ARI | 33.85 | 57.74 | 59.92 | 62.16 | 51.39 | 61.43 | 59.39 | 28.54 | 78.34 | |
Abstract | ACC | 69.18 | 80.14 | 86.32 | 83.83 | 77.63 | 90.78 | 89.62 | 82.28 | 95.91 |
NMI | 38.30 | 54.36 | 59.08 | 60.98 | 65.14 | 68.14 | 66.65 | 52.65 | 82.29 | |
ARI | 27.60 | 51.52 | 62.69 | 62.03 | 70.03 | 74.05 | 66.97 | 53.87 | 87.97 |
Tab. 2 Results of comparison experiments
数据集 | 评价 | K-means | AE | DEC | IDEC | Cop-Kmeans | SDEC | CPAC | S4NMF | IRDSTC |
---|---|---|---|---|---|---|---|---|---|---|
Reu-10k | ACC | 54.04 | 71.16 | 72.75 | 75.03 | 65.84 | 72.80 | 72.79 | 61.99 | 81.27 |
NMI | 41.28 | 48.41 | 51.62 | 51.98 | 42.53 | 48.88 | 48.32 | 41.26 | 66.67 | |
ARI | 27.15 | 54.82 | 57.97 | 58.26 | 50.49 | 53.91 | 54.61 | 28.92 | 62.87 | |
BBC | ACC | 51.58 | 57.03 | 60.45 | 76.40 | 69.16 | 64.80 | 67.01 | 78.83 | 94.79 |
NMI | 30.88 | 53.88 | 57.19 | 64.48 | 53.65 | 51.02 | 51.01 | 56.41 | 85.49 | |
ARI | 20.50 | 41.13 | 47.15 | 69.10 | 49.78 | 44.60 | 41.08 | 53.29 | 87.93 | |
ACM | ACC | 67.37 | 83.83 | 84.74 | 85.13 | 75.02 | 85.53 | 82.09 | 66.94 | 92.30 |
NMI | 33.54 | 51.83 | 54.85 | 56.16 | 48.71 | 55.37 | 56.42 | 27.83 | 71.40 | |
ARI | 33.85 | 57.74 | 59.92 | 62.16 | 51.39 | 61.43 | 59.39 | 28.54 | 78.34 | |
Abstract | ACC | 69.18 | 80.14 | 86.32 | 83.83 | 77.63 | 90.78 | 89.62 | 82.28 | 95.91 |
NMI | 38.30 | 54.36 | 59.08 | 60.98 | 65.14 | 68.14 | 66.65 | 52.65 | 82.29 | |
ARI | 27.60 | 51.52 | 62.69 | 62.03 | 70.03 | 74.05 | 66.97 | 53.87 | 87.97 |
优化目标 | Reu-10k | BBC | ACM | Abstract | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | |
72.75 | 51.62 | 57.97 | 60.45 | 57.19 | 47.51 | 84.74 | 54.85 | 59.92 | 86.32 | 59.08 | 62.69 | |
80.43 | 66.47 | 61.59 | 93.44 | 84.13 | 83.85 | 91.64 | 69.44 | 76.59 | 95.84 | 82.03 | 87.77 | |
73.46 | 53.12 | 59.61 | 63.55 | 60.25 | 50.50 | 88.40 | 62.72 | 68.79 | 88.32 | 63.50 | 68.41 | |
81.27 | 66.67 | 62.87 | 94.79 | 85.49 | 87.93 | 92.30 | 71.40 | 78.34 | 95.91 | 82.29 | 87.97 |
Tab. 3 Results of ablation experiments
优化目标 | Reu-10k | BBC | ACM | Abstract | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | |
72.75 | 51.62 | 57.97 | 60.45 | 57.19 | 47.51 | 84.74 | 54.85 | 59.92 | 86.32 | 59.08 | 62.69 | |
80.43 | 66.47 | 61.59 | 93.44 | 84.13 | 83.85 | 91.64 | 69.44 | 76.59 | 95.84 | 82.03 | 87.77 | |
73.46 | 53.12 | 59.61 | 63.55 | 60.25 | 50.50 | 88.40 | 62.72 | 68.79 | 88.32 | 63.50 | 68.41 | |
81.27 | 66.67 | 62.87 | 94.79 | 85.49 | 87.93 | 92.30 | 71.40 | 78.34 | 95.91 | 82.29 | 87.97 |
[1] | BAI L, LIANG J, CAO F. Semi-supervised clustering with constraints of different types from multiple information sources [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(9): 3247-3258. |
[2] | LI J, LIN C, HUANG R, et al. Intention-guided deep semi-supervised document clustering via metric learning [J]. Journal of King Saud University — Computer and Information Sciences, 2023, 35(1): 416-425. |
[3] | XIAO X, HOU H, DING S. Semi-supervised deep density clustering [J]. Applied Soft Computing, 2023, 148: No.110903. |
[4] | QIN X, YUAN C, JIANG J, et al. Deep semi-supervised clustering based on pairwise constraints and sample similarity [J]. Pattern Recognition Letters, 2024, 178: 1-6. |
[5] | LeCUN Y, BENGIO Y, HINTON G. Deep learning [J]. Nature, 2015, 521(7553): 436-444. |
[6] | RUSK N. Deep learning [J]. Nature Methods, 2016, 13(1): No.3707. |
[7] | DONG S, WANG P, ABBAS K. A survey on deep learning and its applications [J]. Computer Science Review, 2021, 40: No.100379. |
[8] | LI P, PEI Y, LI J. A comprehensive survey on design and application of autoencoder in deep learning [J]. Applied Soft Computing, 2023, 138: No.110176. |
[9] | WANG Y, YAO H, ZHAO S. Auto-encoder based dimensionality reduction [J]. Neurocomputing, 2016, 184: 232-242. |
[10] | XIE J, GIRSHICK R, FARHADI A. Unsupervised deep embedding for clustering analysis [C]// Proceedings of the 33rd International Conference on Machine Learning. New York: JMLR.org, 2016: 478-487. |
[11] | GUO X, GAO L, LIU X, et al. Improved deep embedded clustering with local structure preservation [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. New York: ACM, 2017: 1753-1759. |
[12] | WANG Y, CHANG D, FU Z, et al. Learning a bi-directional discriminative representation for deep clustering [J]. Pattern Recognition, 2023, 137: No.109237. |
[13] | LIN M, WEN K, ZHU X, et al. Graph autoencoder with preserving node attribute similarity [J]. Entropy, 2023, 25(4): No.567. |
[14] | CHEN B, XU S, XU H, et al. Structure-aware deep clustering network based on contrastive learning [J]. Neural Networks, 2023, 167: 118-128. |
[15] | CAI J, HAO J, YANG H, et al. A review on semi-supervised clustering [J]. Information Sciences, 2023, 632: 164-200. |
[16] | XU X, HOU H, DING S. Semi-supervised deep density clustering [J]. Applied Soft Computing, 2023, 148: No.110903. |
[17] | ZHANG D, YANG Y, QIU H. Two-stage semi-supervised clustering ensemble framework based on constraint weight [J]. International Journal of Machine Learning and Cybernetics, 2023, 14(2): 567-586. |
[18] | TAGHIZABET A, TANHA J, AMINI A, et al. A semi-supervised clustering approach using labeled data [J]. Scientia Iranica, 2023, 30(1): 104-115. |
[19] | 姜春茂,吴鹏,李志聪.基于Seeds集和成对约束的半监督三支聚类集成[J].计算机应用,2023, 43(5): 1481-1488. |
JIANG C M, WU P, LI Z C. Semi-supervised three-way clustering ensemble based on Seeds set and pairwise constraints [J]. Journal of Computer Applications, 2023, 43(5): 1481-1488. | |
[20] | MASUD M A, HUANG J Z, ZHONG M, et al. Generate pairwise constraints from unlabeled data for semi-supervised clustering [J]. Data and Knowledge Engineering, 2019, 123: No.101715. |
[21] | MEI J P, LV H, CAO J, et al. Pairwise constrained fuzzy clustering: relation, comparison and parallelization [J]. International Journal of Fuzzy Systems, 2019, 21(6): 1938-1949. |
[22] | FORESTIER G, WEMMERT C. Semi-supervised learning using multiple clusterings with limited labeled data [J]. Information Sciences, 2016, 361/362: 48-65. |
[23] | VOUROS A, VASILAKI E. A semi-supervised sparse k-means algorithm [J]. Pattern Recognition Letters, 2021, 142: 65-71. |
[24] | WAGSTAFF K, CARDIE C, ROGERS S, et al. Constrained k-means clustering with background knowledge [C]// Proceedings of the 18th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2001: 577-584. |
[25] | YANG Y, TAN W, LI T, et al. Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems [J]. Knowledge-Based Systems, 2012, 32: 101-115. |
[26] | WANG Y, ZOU J, WANG K, et al. Semi-supervised deep embedded clustering with pairwise constraints and subset allocation [J]. Neural Networks, 2023, 164: 310-322. |
[27] | CHEN Z, LI C, GAO J, et al. Semisupervised deep embedded clustering with adaptive labels [J]. Scientific Programming, 2021, 2021: No.6613452. |
[28] | SALADI P, GUNTUPALLI R M, PUPPALA S K, et al. Prioritized semi-supervised deep embedded clustering [C]// Proceedings of the 2022 International Conference on Innovative Trends in Information Technology. Piscataway: IEEE, 2022: 1-6. |
[29] | REN Y, HU K, DAI X, et al. Semi-supervised deep embedded clustering [J]. Neurocomputing, 2019, 325: 121-130. |
[30] | YANG X, DENG C, ZHENG F, et al. Deep spectral clustering using dual autoencoder network [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 4061-4070. |
[31] | CAI J, WANG S, GUO W. Unsupervised embedded feature learning for deep clustering with stacked sparse auto-encoder [J]. Expert Systems with Applications, 2021, 186: No.115729. |
[32] | BO D, WANG X, SHI C, et al. Structural deep clustering network [C]// Proceedings of the Web Conference 2020. New York: ACM, 2020: 1400-1410. |
[33] | KADHIM A I, CHEAH Y N, AHAMED N H. Text document preprocessing and dimension reduction techniques for text document clustering [C]// Proceedings of the 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology. Piscataway: IEEE, 2014: 69-73. |
[34] | WANG X, JI H, SHI C, et al. Heterogeneous graph attention network [C]// Proceedings of the 2019 World Wide Web Conference. New York: ACM, 2019: 2022-2032. |
[35] | BAI R, HUANG R, CHEN Y, et al. Deep multi-view document clustering with enhanced semantic embedding [J]. Information Sciences, 2021, 564: 273-287. |
[36] | XU W, LIU X, GONG Y. Document clustering based on non-negative matrix factorization [C]// Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2003: 267-273. |
[37] | ESTÉVEZ P A, TESMER M, PEREZ C A, et al. Normalized mutual information feature selection [J]. IEEE Transactions on Neural Networks, 2009, 20(2): 189-201. |
[38] | XIA R, PAN Y, DU L, et al. Robust multi-view spectral clustering via low-rank and sparse decomposition [C]// Proceedings of the 28th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2014: 2149-2155. |
[39] | HARTIGAN J A, WONG M A. A k-means clustering algorithm [J]. Journal of the Royal Statistical Society Series C: Applied Statistics, 1979, 28(1): 100-108. |
[40] | FOGEL S, AVERBUCH-ELOR H, COHEN-OR D, et al. Clustering-driven deep embedding with pairwise constraints [J]. IEEE Computer Graphics and Applications, 2019, 39(4): 16-27. |
[41] | CHAVOSHINEJAD J, SEYEDI S A, TAB F A, et al. Self-supervised semi-supervised nonnegative matrix factorization for data clustering [J]. Pattern Recognition, 2023, 137: No.109282. |
[42] | VAN DER MAATEN L, HINTON G. Visualizing data using t-SNE [J]. Journal of Machine Learning Research, 2008, 9(11): 2579-2605. |
[1] | Danyang CHEN, Changlun ZHANG. Multi-scale decorrelation graph convolutional network model [J]. Journal of Computer Applications, 2025, 45(7): 2180-2187. |
[2] | Shujian GUO, Jieyue YU, Xuesong YIN. Graph regularized elastic net subspace clustering [J]. Journal of Computer Applications, 2025, 45(5): 1464-1471. |
[3] | Shun YANG, Xiaoyong BIAN, Xi CHEN. Non-iterative graph capsule network for remote sensing scene classification [J]. Journal of Computer Applications, 2025, 45(1): 247-252. |
[4] | Chenqian LI, Jun LIU. Ultrasound carotid plaque segmentation method based on semi-supervision and multi-scale cascaded attention [J]. Journal of Computer Applications, 2024, 44(8): 2604-2610. |
[5] | Xiaoxia JIANG, Ruizhang HUANG, Ruina BAI, Lina REN, Yanping CHEN. Deep event clustering method based on event representation and contrastive learning [J]. Journal of Computer Applications, 2024, 44(6): 1734-1742. |
[6] | Aiguo SHANG, Xinjuan ZHU. Joint approach of intent detection and slot filling based on multi-task learning [J]. Journal of Computer Applications, 2024, 44(3): 690-695. |
[7] | Jie WU, Xuezhong QIAN, Wei SONG. Personalized federated learning based on similarity clustering and regularization [J]. Journal of Computer Applications, 2024, 44(11): 3345-3353. |
[8] | Shuaihua ZHANG, Shufen ZHANG, Mingchuan ZHOU, Chao XU, Xuebin CHEN. Malicious traffic detection model based on semi-supervised federated learning [J]. Journal of Computer Applications, 2024, 44(11): 3487-3494. |
[9] | Shengwei MA, Ruizhang HUANG, Lina REN, Chuan LIN. Structured deep text clustering model based on multi-layer semantic fusion [J]. Journal of Computer Applications, 2023, 43(8): 2364-2369. |
[10] | Mengjie LAN, Jianping CAI, Lan SUN. Self-regularization optimization methods for Non-IID data in federated learning [J]. Journal of Computer Applications, 2023, 43(7): 2073-2081. |
[11] | Wenbo LI, Bo LIU, Lingling TAO, Fen LUO, Hang ZHANG. Deep spectral clustering algorithm with L1 regularization [J]. Journal of Computer Applications, 2023, 43(12): 3662-3667. |
[12] | Kaiqiang YUE, Bo LI, Panlong FAN. Air combat maneuver decision method based on three-way decision [J]. Journal of Computer Applications, 2022, 42(2): 616-621. |
[13] | Yue YANG, Shitong WANG. Four-layer multiple kernel learning method based on random feature mapping [J]. Journal of Computer Applications, 2022, 42(1): 16-25. |
[14] | Lili FAN, Guifu LU, Ganyi TANG, Dan YANG. Low-rank representation subspace clustering method based on Hessian regularization and non-negative constraint [J]. Journal of Computer Applications, 2022, 42(1): 115-122. |
[15] | LIN Junchao, WAN Yuan. Self-adaptive multi-measure unsupervised feature selection method with structured graph optimization [J]. Journal of Computer Applications, 2021, 41(5): 1282-1289. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||