计算机应用 ›› 2016, Vol. 36 ›› Issue (6): 1605-1612.DOI: 10.11772/j.issn.1001-9081.2016.06.1605

• 人工智能 • 上一篇    下一篇

基于短文本的突发事件发展过程表示方法

陈雪, 胡晓峰, 徐浩   

  1. 上海大学 计算机工程与科学学院, 上海 200444
  • 收稿日期:2015-09-21 修回日期:2015-11-30 出版日期:2016-06-10 发布日期:2016-06-08
  • 通讯作者: 陈雪
  • 作者简介:陈雪(1981-),女,河南信阳人,副教授,博士,CCF会员,主要研究方向:语义Web、对等网络、并行体系结构;胡晓峰(1991-),男,湖南湘潭人,硕士研究生,主要研究方向:海量Web信息挖掘;徐浩(1990-),男,江苏南通人,硕士研究生,主要研究方向:海量Web信息挖掘。
  • 基金资助:
    上海市教育委员会科研创新项目(B.10-0108-14-202)。

Burst-event evolution process expression based on short-text

CHEN Xue, HU Xiaofeng, XU Hao   

  1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
  • Received:2015-09-21 Revised:2015-11-30 Online:2016-06-10 Published:2016-06-08
  • Supported by:
    This work is partially supported by Innovation Program of Shanghai Municipal Education Commission (B.10-0108-14-202).

摘要: 针对当前短文本的突发事件分析不能较为简易且准确地描述事件发展过程的问题,提出一种新的基于短文本的突发事件发展过程表示方法。首先,提出一种事件状态值,它被用于描述事件在各个时间点的状态,以便于用户分析事件的发展过程;其次,根据短文本的结构化信息,将事件状态值从文本信息和用户信息两个方面考虑;然后,考虑文本信息的影响因子,构造相关公式计算文本信息权重;再次,考虑用户信息的影响因子,提出一种改造的PageRank算法和用户分层思想,构造相关公式计算用户信息权重;最后,根据文本信息权重和用户信息权重计算事件状态值。实验结果表明依次考虑用户信息、采用改造的PageRank算法以及采用分层思想均能修正1~2个描述点,提高事件发展过程表示的准确度。

关键词: 事件分析, PageRank, 分层, 短文本, 状态值

Abstract: Current analytical method based on short-text can not describe the evolution process of burst-event in a simple and accurate manner. In order to solve the problem, a new method was proposed to express the evolution process of burst-event based on short-text data sets. Firstly, a method of measuring event status was proposed to describe the state of event at each time for analyzing the development process of the event. Secondly, according to the structured information of short-text, the value of event status was set from two aspects: text information and user information. Thirdly, with the consideration of the impact factor of text information, the weight of text information was calculated by constructing related formulas. Fourthly, with the consideration of the impact factor of user information, a modified PageRank algorithm was proposed, and users were divided into different layers to calculate the weight of user information by constructing related formulas. Finally, the weight of text information and the weight of user information were combined to calculate the value of event status. The experimental results show that considering user information in turn, the modified PageRank algorithm, and the idea of dividing the users into different layers all can correct 1~2 points of description and improve the accuracy of expressing the evolution process of event.

Key words: event analysis, PageRank, layering, short-text, status value

中图分类号: