• •    

WISA2024+70 基于分布增强的深度变分文本聚类模型

申奥1,黄瑞章1,薛菁菁2,陈艳平2,秦永彬1   

  1. 1. 贵州大学
    2. 贵州大学计算机科学与技术学院
  • 收稿日期:2024-08-06 修回日期:2024-08-25 发布日期:2024-09-12
  • 通讯作者: 黄瑞章
  • 基金资助:
    贵州省科技支撑计划项目

WISA2024+70 Deep Variational Text Clustering Model Based on Distribution Augmentation

  • Received:2024-08-06 Revised:2024-08-25 Online:2024-09-12

摘要: 摘 要: 针对深度变分文本聚类模型在实际应用中遇到的分布信息缺失和分布坍塌问题,提出一种基于分布增强的深度变分文本聚类模型(DAVAE)。该模型通过分布信息增强的方法,整合增强潜在语义分布至原始潜在语义分布中,从而提高潜在分布的信息完整性和准确性;同时模型采用分布一致性约束策略,促使模型学习一致的语义表征,提高模型学习的语义分布对数据真实信息的表达能力,从而提升聚类性能。与现有的深度变分推断模型和深度聚类模型相比,DAVAE的归一化互信息(NMI)指标在Abstract,BBC,Reuters-10k和BBCSports四个现实数据集上分别提升了0.16、9.01、2.30和2.72个百分点,验证了模型的有效性。

关键词: 深度文本聚类, 分布增强, 变分自编码器, 语义表征, 分布一致性约束

Abstract: Abstract: To address the issues of missing distribution information and distribution collapse encountered by deep variational text clustering models in practical applications, this study proposes a Deep Variational Text Clustering Model Based on Distribution Augmentation (DAVAE). This model can integrate enhanced latent semantic distributions into the original latent semantic distribution by enhancing distribution information to improve the completeness and accuracy of the latent distribution. Additionally, the model employs a distribution consistency constraint strategy to promote the learning of consistent semantic representations, enhancing the model’s ability to express the true information of the data through semantic distributions and thus improving clustering performance. Compared with existing deep variational inference models and deep clustering models, DAVAE's Normalized Mutual Information (NMI) metric is improved by 0.16, 9.01, 2.30, and 2.72 percentage points on the four real-world datasets of Abstract, BBC, Reuters-10k, and BBCSports, respectively. This validates the effectiveness of the model.

Key words: deep text clustering, distribution augmentation, variational autoencoder, semantic representation, distribution consistency constraint

中图分类号: