《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (9): 2667-2673.DOI: 10.11772/j.issn.1001-9081.2021071330

• 人工智能 • 上一篇    

融合社会影响力和时间分布的微博关键事件抽取方法

赵旭剑(), 王崇伟, 王俊力   

  1. 西南科技大学 计算机科学与技术学院,四川 绵阳 621010
  • 收稿日期:2021-07-26 修回日期:2021-09-14 接受日期:2021-09-15 发布日期:2021-09-18 出版日期:2022-09-10
  • 通讯作者: 赵旭剑
  • 作者简介:王崇伟(1995—),男,四川泸州人,硕士研究生,主要研究方向:信息抽取、机器学习;
    王俊力(1996—),男,四川南充人,硕士研究生,主要研究方向:信息抽取、机器学习。
  • 基金资助:
    教育部人文社科基金资助项目(17YJCZH260);四川省科学技术厅重点项目(2020YFS0057);赛尔网络下一代互联网技术创新项目(NGII20180403)

Key event extraction method from microblog by integrating social influence and temporal distribution

Xujian ZHAO(), Chongwei WANG, Junli WANG   

  1. School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang Sichuan 621010,China
  • Received:2021-07-26 Revised:2021-09-14 Accepted:2021-09-15 Online:2021-09-18 Published:2022-09-10
  • Contact: Xujian ZHAO
  • About author:WANG Chongwei, born in 1995, M. S. candidate. His research interests include information extraction, machine learning.
    WANG Junli, born in 1996, M. S. candidate. His research interests include information extraction, machine learning.
  • Supported by:
    Humanities and Social Sciences Foundation of Ministry of Education(17YJCZH260);Key Program of Science and Technology Department of Sichuan Province(2020YFS0057);CERNET Innovation Project(NGII20180403)

摘要:

针对现有微博事件抽取方法由于基于事件的内容特征,而忽略事件本身的社会属性与时间特征之间的关系,进而无法识别微博热点传播过程中关键事件的问题,提出了一种融合社会影响力和时间分布的微博关键事件抽取方法。首先通过建模社会影响力来刻画微博事件的重要性,然后融合微博事件演化过程中的时间特性以捕获事件在不同时间分布下的差异,最后抽取出不同时间分布下的微博关键事件。在真实数据集上的实验结果表明,所提方法能有效抽取微博热点中的关键事件,较随机选择、词频-逆文本频率(TF-IDF)、最小权重支配集以及度与聚集系数这四种方法在事件集的完整性指标ROUGE-1上在数据集1上分别提升了21%、18%、26%以及30%,在数据集2上分别提升了14%、2%、21%以及23%,抽取效果优于传统方法。

关键词: 社会影响力, 时间分布, 微博, 事件抽取, 事件演化

Abstract:

Aiming at the problem that the existing microblog event extraction methods are based on the content characteristics of events and ignore the relationship between the social attributes and time characteristics of events, so that they cannot identify the key events in the propagation process of microblog hot spots, a key event extraction method from microblog by integrating social influence and temporal distribution was proposed. Firstly, the social influence was modeled to present importance of microblog events. Secondly, the temporal characteristics of microblog events during evolution were considered to capture the differences of events under different temporal distributions. Finally, the key microblog events were extracted under different temporal distributions. Experimental results on real datasets show that the proposed method can effectively extract key events in microblog hot spots. Compared with four methods of random selection, Term Frequency-Inverse Document Frequency (TF-IDF), minimum-weight connected dominating set and degree and clustering coefficient information, the proposed method has the event set integrity index improved by 21%, 18%, 26% and 30% on dataset 1 respectively, and 14%, 2%, 21% and 23% on dataset 2 respectively. The extraction effect of the proposed method is better than those of the traditional methods.

Key words: social influence, temporal distribution, microblog, event extraction, event evolution

中图分类号: