《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (11): 3139-3144.DOI: 10.11772/j.issn.1001-9081.2021030451

• 人工智能 • 上一篇    下一篇

基于图卷积网络的微博新闻故事线抽取方法

赵旭剑(), 王崇伟   

  1. 西南科技大学 计算机科学与技术学院,四川 绵阳 621010
  • 收稿日期:2021-03-24 修回日期:2021-06-03 接受日期:2021-06-03 发布日期:2021-11-29 出版日期:2021-11-10
  • 通讯作者: 赵旭剑
  • 作者简介:赵旭剑(1984—),男,四川绵阳人,副教授,博士,CCF 会员,主要研究方向:文本挖掘、自然语言处理、Web 信息处理
    王崇伟(1995—),男,四川泸州人,硕士研究生,主要研究方向:信息抽取、机器学习。
  • 基金资助:
    教育部人文社会科学基金资助项目(17YJCZH260);四川省科学技术厅重点项目(2020YFS0057);赛尔网络下一代互联网技术创新项目(NGII20180403)

Storyline extraction method from Weibo news based on graph convolutional network

Xujian ZHAO(), Chongwei WANG   

  1. School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang Sichuan 621010,China
  • Received:2021-03-24 Revised:2021-06-03 Accepted:2021-06-03 Online:2021-11-29 Published:2021-11-10
  • Contact: Xujian ZHAO
  • About author:ZHAO Xujian,born in 1984,Ph. D.,associate professor. His research interests include text mining,natural language processing,Web information processing
    WANG Chongwei,born in 1995,M. S. candidate. His research interests include information extraction,machine learning.
  • Supported by:
    the Humanities and Social Sciences Foundation of the Ministry of Education(17YJCZH260);the Key Project of Science and Technology Department of Sichuan Province(2020YFS0057);the CERNET Innovation Project(NGII20180403)

摘要:

微博作为人们获取和传播新闻事件的主要平台,隐藏着丰富的事件信息。从微博数据中抽取故事线能为用户提供一种直观的方式来准确理解事件演化,然而微博数据稀疏和上下文缺乏的特点为故事线抽取带来了挑战。因此,通过两个连续的任务从微博数据中自动抽取故事线:1)基于微博传播影响力对事件进行建模,并提取出首要事件;2)基于事件特征建立异构事件图,提出事件图卷积网络(E-GCN)模型来提升对事件间隐式关系的学习能力,从而实现事件的故事分支预测并链接事件。在真实数据集上从故事分支和故事线两个角度进行评测,结果表明所提方法在故事分支生成测评中,相较于贝叶斯模型、斯坦纳树和故事森林在F1值上,在Dataset1上分别高出28个百分点、20个百分点和27个百分点,在Dataset2上分别高出19个百分点、12个百分点和22个百分点;而在故事线抽取评测中,相较于故事时间线、斯坦纳树和故事森林在正确的边准确率上,在Dataset1上分别高出33个百分点、23个百分点和17个百分点,在Dataset2上分别高出12个百分点、3个百分点和9个百分点。

关键词: 社交网络, 微博, 首要事件, 故事线, 图卷积网络

Abstract:

As a key platform for people to acquire and disseminate news events, Weibo hides rich event information. Extracting storylines from Weibo data provides users with an intuitive way to accurately understand event evolution. However, the data sparseness and lack of context make it difficult to extract storylines from Weibo data. Therefore, two consecutive tasks for extracting storylines automatically from Weibo data were introduced: 1) events were modeled by propagation impact of Weibo, and the primary events were extracted; 2) the heterogeneous event graph was built based on the event features, and an Event Graph Convolution Network (E-GCN) model was proposed to improve the learning ability of implicit relations between events, so as to predict story branches of the events and link the events. The proposed method was evaluated from the perspectives of story branch and storyline on real datasets. In story branch generation evaluation, the results show that compared with Bayesian model, Steiner tree and Story forest, the proposed method has the F1 value higher by 28 percentage points, 20 percentage points and 27 percentage points on Dataset1 respectively, and higher by 19 percentage points, 12 percentage points and 22 percentage points on Dataset2 respectively. In storyline extraction evaluation, the results show that compared with Story timeline, Steiner tree and Story forest, the proposed method has the correct edge accuracy higher by 33 percentage points, 23 percentage points and 17 percentage points on Dataset1 respectively, and higher by 12 percentage points, 3 percentage points and 9 percentage points on Dataset2 respectively.

Key words: social network, Weibo, primary event, storyline, Graph Convolutional Network (GCN)

中图分类号: