《计算机应用》唯一官方网站

• •    下一篇

基于事件表示和对比学习的深度事件聚类方法

蒋小霞1,黄瑞章1,白瑞娜1,任丽娜2,陈艳平2   

  1. 1. 贵州大学
    2. 贵州大学计算机科学与技术学院
  • 收稿日期:2023-07-01 修回日期:2023-08-03 发布日期:2023-08-23 出版日期:2023-08-23
  • 通讯作者: 蒋小霞
  • 基金资助:
    面向多视图文本的深度文本聚类方法研究;面向高职院校网络舆情数据的深度文本聚类方法研究;基于文本计算与行业知识图谱的互联网内容风控关键技术研究

Deep event clustering method based on event representation and contrastive learning

  • Received:2023-07-01 Revised:2023-08-03 Online:2023-08-23 Published:2023-08-23

摘要: 针对现有深度聚类方法不考虑事件信息及其结构特点而难以有效划分事件类型的问题,提出基于事件表示和对比学习的深度事件聚类方法(DEC_ERCL)。首先,该方法利用信息识别手段从非结构化文本中识别出结构化的事件信息,避免了冗余信息对事件语义的影响;其次,将事件的结构信息集成到自编码器中学习低维稠密的事件表示,并以此作为下游聚类划分的依据;最后,为有效建模事件之间的细微差异,在特征学习过程中加入多正例对比损失。在数据集DuEE、FewFC、Military和ACE2005上的实验结果表明,所提方法相较于其他深度聚类方法在准确率和标准化互信息评价指标上均具有更好的表现,相较于次优的IDEC,DEC_ERCL的聚类准确率分别提升了21.34%、26.46%、7.36%、39.97%,表明了DEC_ERCL具有更好的事件聚类效果。

关键词: 深度聚类, 文本聚类, 事件表示, 事件结构, 对比学习

Abstract: Aiming at the problem that the existing deep clustering methods is difficult to efficiently divide event types without considering event information and its structural characteristics, a Deep Event Clustering method based on Event Representation and Contrastive Learning (DEC_ERCL) was proposed. Firstly, information recognition was utilized by this method to identify structured event information from unstructured text, thus the impact of redundant information on event semantics was avoided. Secondly, the structural information of the event was integrated into the autoencoder to learn the low-dimensional dense event representation, which was used as the basis for downstream clustering. Finally, in order to effectively model the subtle differences between events, a contrast loss with multiple positive examples was added to the feature learning process. Experiments results on the datasets DuEE, FewFC, Military(a dataset of domain events based on open-source Military news) and ACE2005 show that the proposed method performs better than other deep clustering methods in accuracy and normalized mutual information evaluation indexes. Compared with the suboptimal IDEC(Improved Deep Embedding Cluster), the accuracy of DEC_ERCL is increased by 21.34%, 26.46%, 7.36% and 39.97%, respectively, indicating that DEC_ERCL has better event clustering effect.

Key words: deep clustering, text clustering, event representation, event structure, contrastive learning

中图分类号: