计算机应用 ›› 2016, Vol. 36 ›› Issue (2): 563-567.DOI: 10.11772/j.issn.1001-9081.2016.02.0563

叶菁菁, 李琳, 钟珞   

  1. 武汉理工大学 计算机科学与技术学院, 武汉 430070
  • 收稿日期:2015-09-02 修回日期:2015-10-24 发布日期:2016-02-03 出版日期:2016-02-10
  • 通讯作者: 李琳(1977-),女,湖南衡阳人,副教授,博士,CCF会员,主要研究方向:社会计算、信息检索及推荐系统。
  • 作者简介:叶菁菁(1992-),女,江苏盐城人,硕士研究生,主要研究方向:自然语言处理、大数据分析;钟珞(1957-),男,湖北武汉人,教授,博士,CCF会员,主要研究方向:智能方法、软件工程。
Keyword extraction method for microblog based on hashtag

YE Jingjing, LI Lin, ZHONG Luo   

  1. School of Computer Science and Technology, Wuhan University of Technology, Wuhan Hubei 430070, China
  • Received:2015-09-02 Revised:2015-10-24 Online:2016-02-03 Published:2016-02-10

摘要: 针对微博关键词抽取准确率不高的问题,提出一种基于标签优先的抽取排序方法。该方法利用微博本身具有的社交特征——标签,从微博内容集中抽取关键词。该方法首先根据微博自身建立初始词与微博之间的加权图,再将基于标签的随机游走方法应用于图中,随机游走反复跳跃到标签词节点上,经过一系列迭代得出每个词的平稳概率,并通过概率决定词的最终排序。该抽取方法根据真实的新浪微博内容进行测验,结果显示,与通过词与词的加权图来抽取关键词相比,基于标签的微博关键词抽取方法准确率提高了50%,在实际应用中能够有效提高关键词抽取的正确率。

关键词: 关键词抽取, 微博, 标签, 随机游走, 加权策略

Abstract: A hashtag based method was proposed to solve the problem how to accurately extract keywords from microblog. Hashtag, the social feature of a microblog was used to extract keywords from microblog content. A word-post weighted graph was built firstly, then a random walker was used on the graph by jumping to any hashtag node repeatedly. At last, every word rank was determined by its probability which would not change after walker iteration. The experiments were conducted on real microblogs from Sina platform. The results show that, compared to word-word graph method, the proposed hashtag-based approach gets higher accuracy of keyword extraction by 50%.

Key words: keyword extraction, microblog, hashtag, random walk, weighting strategy
