计算机应用 ›› 2019, Vol. 39 ›› Issue (11): 3263-3267.DOI: 10.11772/j.issn.1001-9081.2019040647

• 人工智能 • 上一篇    下一篇

基于多特征的微博突发事件检测算法

王雪颖1, 杨文忠1, 张志豪1, 李东昊1, 秦旭2   

  1. 1. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046;
    2. 新疆大学 软件学院, 乌鲁木齐 830046
  • 收稿日期:2019-04-17 修回日期:2019-07-02 出版日期:2019-11-10 发布日期:2019-08-21
  • 通讯作者: 杨文忠
  • 作者简介:王雪颖(1995-),女,新疆克拉玛依人,硕士研究生,主要研究方向:自然语言处理;杨文忠(1971-),男,新疆乌鲁木齐人,副教授,博士,CCF会员,主要研究方向:舆情分析、信息安全、机器学习;张志豪(1995-),男,新疆玛纳斯人,硕士研究生,主要研究方向:突发事件预警、信息安全;李东昊(1994-),男,新疆乌鲁木齐人,硕士研究生,主要研究方向:自然语言处理;秦旭(1994-),女,辽宁阜新人,硕士研究生,主要研究方向:自然语言处理、舆情分析。
  • 基金资助:
    国家自然科学基金资助项目(U1603115,U1435215);新疆维吾尔自治区高校科研计划项目创新团队(XJEDU2017T002);新疆维吾尔自治区自然科学基金资助项目(2017D01C042)。

Microblog bursty events detection algorithm based on multi-feature

WANG Xueying1, YANG Wenzhong1, ZHANG Zhihao1, LI Donghao1, QIN Xu2   

  1. 1. College of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China;
    2. College of Software, Xinjiang University, Urumqi Xinjiang 830046, China
  • Received:2019-04-17 Revised:2019-07-02 Online:2019-11-10 Published:2019-08-21
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (U1603115, U1435215), the Xinjiang Uygur Autonomous Region University Research Project Innovation Team (XJEDU2017T002), the Natural Science Foundation of Xinjiang Autonomous Region (2017D01C042).

摘要: 为了降低社交媒体中突发事件带来的危害,提出一种基于多特征的微博突发事件检测算法。该算法融合了文本情感过滤和用户影响力计算方法。首先,通过噪声过滤和情感过滤得到饱含负面情感的微博文本;然后,采用提出的用户影响力计算方法并结合突发词提取算法来提取突发词特征;最后,引入凝聚式层次聚类算法对突发词集进行聚类,从中提取突发事件。通过实验检测,准确率为66.84%,验证了该方法能有效地对突发事件进行检测。

关键词: 突发事件, 用户影响力, 情感过滤, 突发词, 聚类

Abstract: In order to reduce the harm caused by bursty events in social media, a multi-feature based microblog bursty events detection algorithm was proposed. The algorithm combines text emotion filtering and user influence calculation methods. Firstly, the microblog text with negative emotion was obtained through noise filtering and emotion filtering. Then the proposed user influence calculation method was combined with the burst word extraction algorithm to extract the characteristics of burst words. Finally, a cohesive hierarchical clustering algorithm was introduced to cluster bursty word sets, and extract bursty events from them. In the experimental test, the accuracy is 66.84%, which proves that the proposed method can effectively detect bursty events.

Key words: bursty topic, users' influence, sentiment filter, burst word, clustering

中图分类号: