• •    

基于图卷积网络的微博新闻故事线抽取方法

赵旭剑,王崇伟   

  1. 西南科技大学计算机科学与技术学院
  • 收稿日期:2021-03-24 修回日期:2021-06-03 发布日期:2021-06-03
  • 通讯作者: 赵旭剑

Storyline extraction method from Weibo news based on graph convolutional network

  • Received:2021-03-24 Revised:2021-06-03 Online:2021-06-03

摘要: 微博作为人们获取和传播新闻事件的主要平台,隐藏着丰富的事件信息。从微博抽取故事线能为用户提供一种直观的方式来准确理解事件演化,然而微博数据稀疏和上下文缺乏的特点为故事线抽取带来挑战。为此通过两个连续的任务从微博数据自动抽取故事线:1)基于微博传播影响力对事件进行建模,提取首要事件;2)基于事件特征建立异构事件图,提出事件图卷积网络(E-GCN)模型,提升对事件间隐式关系的学习能力,实现事件的故事分支预测并链接事件。在真实数据集上从故事分支和故事线两个角度进行评测,结果表明所提方法在故事分支生成上,较贝叶斯模型、斯坦纳树和故事森林在F1值中,数据集1分别高出28%、20%和27%,数据集2分别高出19%、12%和22%;而在故事线评测上,较故事时间线、斯坦纳树和故事森林在正确的边指标中,数据集1中分别高出33%、23%和17%,数据集2中分别高出12%、3%和9%。

关键词: 社交网络, 微博, 首要事件, 故事线, 图卷积网络

Abstract: As a key platform for users to acquire and disseminate news events, Weibo has rich event information. Extracting storylines from Weibo provides users with an intuitive way to accurately digest event evolution. However, data sparseness and lack of context make it difficult to extract storyline from Weibo data. Therefore, two consecutive tasks for extracting storylines were introduced: 1) Events were modeled by propagation impact of microblogs and primary events were extracted. 2) The heterogeneous event graph was built based on event features, and event graph convolution network model was proposed to improve the learning ability of implicit relationship between events, predict story branches and link events. The proposed method was evaluated from the perspectives of story branch and storyline on real datasets, and the results show that compared with Bayesian Model, Steiner Tree and Story Forest, the F1 value of proposed method is 28%, 20% and 27% higher in dataset-1, and 19%, 12% and 22% higher in dataset-2. In storyline evaluation, the results show that compared with Story Timeline, Steiner Tree and Story Forest, the edge accuracy of proposed method is 33%, 23% and 17% higher in dataset-1, and 12%, 3% and 9% higher in dataset-2.

Key words: social network, Weibo, primary event, storyline, graph convolutional network

中图分类号: