Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1734-1742.DOI: 10.11772/j.issn.1001-9081.2023060851

Special Issue: CCF第38届中国计算机应用大会 (CCF NCCA 2023)

• The 38th CCF National Conference of Computer Applications (CCF NCCA 2023) • Previous Articles     Next Articles

Deep event clustering method based on event representation and contrastive learning

Xiaoxia JIANG1,2,3, Ruizhang HUANG1,2,3(), Ruina BAI1,2,3, Lina REN1,2,3, Yanping CHEN1,2,3   

  1. 1.State Key Laboratory of Public Big Data (Guizhou University),Guiyang Guizhou 550025,China
    2.Text Computing & Cognitive Intelligence Engineering Research Center of National Education Ministry (Guizhou University),Guiyang Guizhou 550025,China
    3.College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
  • Received:2023-07-04 Revised:2023-08-03 Accepted:2023-08-08 Online:2023-08-23 Published:2024-06-10
  • Contact: Ruizhang HUANG
  • About author:JIANG Xiaoxia, born in 1998, M.S. candidate. Her research interests include natural language processing, text mining, machine learning.
    BAI Ruina, born in 1994, Ph. D. candidate. Her research interests include natural language processing, multi-view learning.
    REN Lina, born in 1987, Ph. D. candidate. Her research interests include natural language processing, text mining, machine learing.
    CHEN Yanping, born in 1980, Ph. D., professor. His research interests include artificial intelligence, natural language processing.
  • Supported by:
    National Natural Science Foundation of China(62066007);Vocational Education Research Project of Guizhou Provincial Department of Education(GZZJ-Q2022028);Science and Technology Support Program of Guizhou Province([2023]300)

基于事件表示和对比学习的深度事件聚类方法

蒋小霞1,2,3, 黄瑞章1,2,3(), 白瑞娜1,2,3, 任丽娜1,2,3, 陈艳平1,2,3   

  1. 1.公共大数据国家重点实验室(贵州大学), 贵阳 550025
    2.文本计算与认知智能教育部工程研究中心(贵州大学), 贵阳 550025
    3.贵州大学 计算机科学与技术学院, 贵阳 550025
  • 通讯作者: 黄瑞章
  • 作者简介:蒋小霞(1998—),女,贵州安顺人,硕士研究生,主要研究方向:自然语言处理、文本挖掘、机器学习
    白瑞娜(1994—),女,内蒙古包头人,博士研究生,主要研究方向:自然语言处理、多视图学习
    任丽娜(1987—),女,辽宁阜新人,博士研究生,主要研究方向:自然语言处理、文本挖掘、机器学习
    陈艳平(1980—),男,贵州长顺人,教授,博士,CCF会员,主要研究方向:人工智能、自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(62066007);贵州省教育厅职业教育科研项目(GZZJ?Q2022028);贵州省科技支撑计划项目(黔科合支撑[2023]一般300)

Abstract:

Aiming at the problem that the existing deep clustering methods can not efficiently divide event types without considering event information and its structural characteristics, a Deep Event Clustering method based on Event Representation and Contrastive Learning (DEC_ERCL) was proposed. Firstly, information recognition was utilized to identify structured event information from unstructured text, thus the impact of redundant information on event semantics was avoided. Secondly, the structural information of the event was integrated into the autoencoder to learn the low-dimensional dense event representation, which was used as the basis for downstream clustering. Finally, in order to effectively model the subtle differences between events, a contrast loss with multiple positive examples was added to the feature learning process. Experimental results on the datasets DuEE, FewFC, Military and ACE2005 show that the proposed method performs better than other deep clustering methods in accuracy and Normalized Mutual Information (NMI) evaluation indexes. Compared with the suboptimal method, the accuracy of DEC_ERCL is increased by 17.85%,9.26%,7.36% and 33.54%, respectively, indicating that DEC_ERCL has better event clustering effect.

Key words: deep clustering, text clustering, event representation, event structure, contrastive learning

摘要:

针对现有深度聚类方法不考虑事件信息及其结构特点而难以有效划分事件类型的问题,提出一种基于事件表示和对比学习的深度事件聚类方法(DEC_ERCL)。首先,利用信息识别手段从非结构化文本中识别结构化的事件信息,避免冗余信息对事件语义的影响;其次,将事件的结构信息集成于自编码器中学习低维稠密的事件表示,并以此作为下游聚类划分的依据;最后,为有效建模事件之间的细微差异,在特征学习过程中加入多正例对比损失。在数据集DuEE、FewFC、Military和ACE2005上的实验结果表明,相较于其他深度聚类方法,所提方法在准确率和标准化互信息(NMI)评价指标上均表现更好;相较于次优的方法,DEC_ERCL的聚类准确率分别提升了17.85%、9.26%、7.36%和33.54%,表明了DEC_ERCL具有更好的事件聚类效果。

关键词: 深度聚类, 文本聚类, 事件表示, 事件结构, 对比学习

CLC Number: