计算机应用 ›› 2013, Vol. 33 ›› Issue (11): 3076-3079.

• 数据库技术 • 上一篇    下一篇

基于用户角色定位的微博热点话题检测方法

杨武,李阳,卢玲   

  1. 重庆理工大学 计算机科学与工程学院,重庆 400054
  • 收稿日期:2013-05-10 修回日期:2013-07-16 出版日期:2013-11-01 发布日期:2013-12-04
  • 通讯作者: 李阳
  • 作者简介:杨武(1965-),男,湖北武汉人,教授,主要研究方向:信息检索;李阳(1989-),女,河南汝州人,硕士研究生,主要研究方向:信息检索;卢玲(1975-),女,河南焦作人,讲师,硕士,主要研究方向:信息检索、文本信息挖掘。

Micro-blog hot topics detection method based on user role orientation

YANG Wu,LI Yang,LU Ling   

  1. School of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Received:2013-05-10 Revised:2013-07-16 Online:2013-12-04 Published:2013-11-01
  • Contact: LI Yang

摘要: 针对在海量微博数据中提取热点话题效率较低的问题,在对用户角色分类的基础上,提出了一种新的热点话题检测方法。首先,根据用户关注度进行用户角色定位,过滤掉部分用户的噪声数据;其次,采用结合语义相似度的TF-IDF函数计算特征权重,降低语义表达形式带来的误差;然后,用改进的Single-Pass聚类算法进行话题聚类,提取出微博话题;最后,根据微博转发数、评论数等对话题热度进行评估排序,从而发现热点话题。实验表明,所提出的方法使漏检率和误检率分别平均降低12.09%和2.37%,有效地提高了话题检测的正确率,验证了该方法的可行性。

关键词: 微博, 话题检测, 用户角色, 语义相似度, Single-Pass聚类

Abstract: To solve the low extraction efficiency for extracting hot topics in huge amounts of micro-blog data, a new topics detection method based on user role orientation was proposed. Firstly, some noise data of parts of users were filtered out by user role orientation. Secondly, the feature weight was calculated by the Term Frequency-Inverse Document Frequency (TF-IDF) function combined with semantic similarity to reduce the error caused by semantic expression. Then, the improved Single-Pass clustering algorithm was used to extract the topics of micro-blog. Lastly, the heat evaluation of micro-blog topics was made according to the number of reposts and comments, thus the hot topics were found. The results show that the average missing rate and false detection rate respectively decrease by 12.09% and 2.37%, and further indicate the topic detection accuracy rate is effectively improved and the method is feasible.

Key words: micro-blog, topic detection, user role, semantic similarity, Single-Pass clustering

中图分类号: