Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Deep semi-supervised text clustering with intentional regularization
Le XU, Ruizhang HUANG, Ruina BAI, Yongbin QIN
Journal of Computer Applications    2025, 45 (7): 2145-2152.   DOI: 10.11772/j.issn.1001-9081.2024070931
Abstract38)   HTML0)    PDF (1772KB)(9)       Save

Aiming at the problem that the existing semi-supervised text clustering methods fail to consider user intent in processes of representation learning and clustering simultaneously, a Deep Semi-supervised Text Clustering with Intentional Regularization (IRDSTC) model was proposed. With the introduction of intention regularization strategy, the Intention Regularized Representation Learning (IRRL) module and Intention Regularized Clustering (IRC) module were designed. Firstly, an intent matrix was constructed on the basis of the intent constraint information provided by the user to capture the user’s expectations for the relationship between texts. Secondly, the matrix was applied to the representation learning stage and the clustering stage. In the representation learning stage, the intermediate layer representation extracted by the deep model was converted into a representation correlation matrix, and the intent matrix was combined to construct a regular term, so as to use user intent to drive the representation learning. In the clustering stage, an allocation consistency matrix was constructed according to the class cluster allocation probabilities obtained from clustering iterations, and the intent matrix was combined to construct regular terms, so as to realize the guidance of user intent to the clustering process. Experimental results show that IRDSTC model has better performance in clustering ACCuracy (ACC), Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI) compared to other clustering methods on Reu-10k, BBC, ACM, and Abstract datasets. In specific, compared with Improved Deep Embedding Clustering(IDEC), IRDSTC model has the NMI increased by 28.26%, 32.58%, 27.13%, and 34.94%, respectively, indicating that IRDSTC model has better clustering effect.

Table and Figures | Reference | Related Articles | Metrics