《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (8): 2376-2381.DOI: 10.11772/j.issn.1001-9081.2022091377

• 第十九届CCF中国信息系统及应用大会 • 上一篇    下一篇

面向热点新闻事件的层次化故事脉络生成方法

刘东1,2, 林川1,2(), 任丽娜1,2, 黄瑞章1,2   

  1. 1.公共大数据国家重点实验室(贵州大学),贵阳 550025
    2.贵州大学 计算机科学与技术学院,贵阳 550025
  • 收稿日期:2022-09-06 修回日期:2022-10-26 接受日期:2022-10-28 发布日期:2022-12-12 出版日期:2023-08-10
  • 通讯作者: 林川
  • 作者简介:刘东(1997—),男,四川成都人,硕士研究生,CCF会员,主要研究方向:自然语言处理、文本挖掘
    任丽娜(1987—),女,辽宁阜新人,讲师,博士研究生,CCF会员,主要研究方向:自然语言处理、文本挖掘、机器学习
    黄瑞章(1979—),女,天津人,教授,博士,主要研究方向:自然语言理解、数据融合分析、文本挖掘、知识发现。
  • 基金资助:
    国家自然科学基金资助项目(62066007)

Hierarchical storyline generation method for hot news events

Dong LIU1,2, Chuan LIN1,2(), Lina REN1,2, Ruizhang HUANG1,2   

  1. 1.State Key Laboratory of Public Big Data (Guizhou University),Guiyang Guizhou 550025,China
    2.College of Computer Science and Technology,Guizhou University,Guiyang Guizhou 550025,China
  • Received:2022-09-06 Revised:2022-10-26 Accepted:2022-10-28 Online:2022-12-12 Published:2023-08-10
  • Contact: Chuan LIN
  • About author:LIU Dong, born in 1997, M. S. candidate. His research interests include natural language processing, text mining.
    REN Lina, born in 1987, Ph. D. candidate, lecturer. Her research interests include natural language processing, text mining, machine learning.
    HUANG Ruizhang, born in 1979, Ph. D., professor. Her research interests include natural language understanding, data fusion analysis, text mining, knowledge discovery.
  • Supported by:
    National Natural Science Foundation of China(62066007)

摘要:

热点新闻事件的发展十分丰富,各个阶段的发展都有其独特的叙述,并且随着事件的发展呈现出层次化故事脉络演化的趋势。针对现有故事脉络生成方法存在脉络可解释性不佳以及缺乏层次性的问题,提出一种面向热点新闻事件的层次化故事脉络生成方法(HSGM)。首先,采用改进热词算法来挑选主干种子事件,以构建主干脉络;其次,挑选分支事件热词以增强分支可解释性;然后,在分支脉络中采用融合热词关联度与动态时间惩罚的脉络连贯度挑选策略来增强父子事件的连接,以构建层次化热词,进而构建多层次故事脉络;此外,考虑到热点新闻事件存在潜伏期,在脉络构建过程加入孵化池以解决因热度不够所产生的初始事件被忽略问题。在两个自建真实数据集上进行实验的结果表明,在事件追踪过程中,与分别基于singlePass和基于k-means的方法相比,HSGM的F值分别高出了4.51%、6.41%和20.71%、13.01%;而在脉络构建过程中,与Story Forest和Story Graph相比,HSGM在两个自建数据集上的准确性、可理解性、完整性方面表现良好。

关键词: 故事脉络, 热点新闻事件, 故事树, 事件演化, 聚类

Abstract:

The development of hot news events is very rich, and each stage of the development has its own unique narrative. With the development of events, a trend of hierarchical storyline evolution is presented. Aiming at the problem of poor interpretability and insufficient hierarchy of storyline in the existing storyline generation methods, a Hierarchical Storyline Generation Method (HSGM) for hot news events was proposed. First, an improved hotword algorithm was used to select the main seed events to construct the trunk. Second, the hotwords of branch events were selected to enhance the branch interpretability. Third, in the branch, a storyline coherence selection strategy fusing hotword relevance and dynamic time penalty was used to enhance the connection of parent-child events, so as to build hierarchical hotwords, and then a multi-level storyline was built. In addition, considering the incubation period of hot news events, a hatchery was added during the storyline construction process to solve the problem of neglecting the initial events due to insufficient hotness. Experimental results on two real self-constructed datasets show that in the event tracking process, compared with the methods based on singlePass and k-means respectively, HSGM has the F score increased by 4.51% and 6.41%, 20.71% and 13.01% respectively; in the storyline construction process, HSGM performs well in accuracy, comprehensibility and integrity on two self-constructed datasets compared with Story Forest and Story Graph.

Key words: storyline, hot news event, story tree, event evolution, clustering

中图分类号: