计算机应用 ›› 2020, Vol. 40 ›› Issue (12): 3458-3464.DOI: 10.11772/j.issn.1001-9081.2020060880

• 2020年中国粒计算与知识发现学术会议(CGCKD 2020) • 上一篇    下一篇

面向微博文本流的负面情感突发话题检测

李艳红1,2, 赵宏伟1,2, 王素格1,2, 李德玉1,2   

  1. 1. 山西大学 计算机与信息技术学院, 太原 030006;
    2. 计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
  • 收稿日期:2020-06-12 修回日期:2020-08-20 出版日期:2020-12-10 发布日期:2020-10-20
  • 通讯作者: 李艳红(1977-),女,山西临汾人,副教授,博士,CCF会员,主要研究方向:数据挖掘、机器学习。liyh@sxu.edu.cn
  • 作者简介:赵宏伟(1995-),男,山西大同人,硕士研究生,主要研究方向:数据挖掘、机器学习;王素格(1964-),女,山西太原人,教授,博士,CCF会员,主要研究方向:中文信息处理、机器学习;李德玉(1965-),男,山西太原人,教授,博士,CCF会员,主要研究方向:智能计算、数据挖掘
  • 基金资助:
    山西省重点研发计划项目(201803D421024)。

Detection of negative emotion burst topic in microblog text stream

LI Yanhong1,2, ZHAO Hongwei1,2, WANG Suge1,2, LI Deyu1,2   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing, Ministry of Education(Shanxi University), Taiyuan Shanxi 030006, China
  • Received:2020-06-12 Revised:2020-08-20 Online:2020-12-10 Published:2020-10-20
  • Supported by:
    This work is partially supported by the Key Research and Development Program of Shanxi Province (201803D421024).

摘要: 如何从海量、嘈杂的微博文本流中及时发现负面情感突发话题对于突发事件的应急响应和处置至关重要,而传统的突发话题检测方法往往忽略了负面情感突发话题与非负面情感突发话题之间的区别,为此提出了一种面向微博文本流的负面情感突发话题检测(NE-BTD)算法。首先,将微博中的主题词对的加速度和负面情感强度变化率作为负面情感突发话题的判定依据;然后,利用突发词对的速度确定负面情感突发话题的窗口范围;最后,使用一种基于吉布斯采样的狄利克雷多项式混合模型(GSDMM)聚类算法得到窗口中负面情感突发话题的主题结构。在实验中将所提出的NE-BTD算法与已有的一种基于情感方法的话题检测(EBM-TD)算法进行对比,结果表明所提出的NE-BTD算法相较EBM-TD算法准确率和召回率至少提高了20%,并且可以至少提前40 min检出负面情感突发话题。

关键词: 微博, 文本流, 突发话题, 负面情感, 狄利克雷多项式混合模型

Abstract: How to find negative emotion burst topic in time from massive and noisy microblog text stream is essential for emergency response and handling of emergencies. However, the traditional burst topic detection methods often ignore the differences between negative emotion burst topic and non-negative emotion burst topic. Therefore, a Negative Emotion Burst Topic Detection (NE-BTD) algorithm for microblog text stream was proposed. Firstly, the accelerations of keyword pairs in microblog and the change rate of negative emotion intensity were used as the basis for judging the topics of negative emotion. Secondly, the speeds of burst word pairs were used to determine the window range of negative emotion burst topics. Finally, a Gibbs Sampling Dirichlet Multinomial Mixture model (GSDMM) clustering algorithm was used to obtain the topic structures of the negative emotion burst topics in the window. In the experiments, the proposed NE-BTD algorithm was compared with an existing Emotion-Based Method of Topic Detection (EBM-TD) algorithm. The results show that the NE-BTD algorithm was at least 20% higher in accuracy and recall than the EBM-TD algorithm, and it can detect negative emotion burst topic at least 40 minutes earlier.

Key words: microblog, text stream, burst topic, negative emotion, Dirichlet multinomial mixture model

中图分类号: