计算机应用 ›› 2016, Vol. 36 ›› Issue (2): 424-427.DOI: 10.11772/j.issn.1001-9081.2016.02.0424

• 第三届CCF大数据学术会议(CCF BigData 2015) • 上一篇    下一篇

基于词语相关度的微博新情感词自动识别

陈鑫1, 王素格1,2, 廖健1   

  1. 1. 山西大学 计算机与信息技术学院, 太原 030006;
    2. 计算智能与中文信息处理教育部重点实验室(山西大学), 太原 030006
  • 收稿日期:2015-08-29 修回日期:2015-09-13 出版日期:2016-02-10 发布日期:2016-02-03
  • 通讯作者: 王素格(1964-),女,河北定州人,教授,博士,CCF会员,主要研究方向:自然语言处理。
  • 作者简介:陈鑫(1992-),女,山西长治人,硕士研究生,CCF学生会员,主要研究方向:文本情感分析;廖健(1990-),男,湖北鄂州人,博士研究生,CCF学生会员,主要研究方向:文本情感分析。
  • 基金资助:
    国家863计划项目(2015AA015407);国家自然科学基金资助项目(61175067,61272095,61432011,61573231,U1435212);山西省科技基础条件平台计划项目(2015091001-0102);山西省回国留学人员科研项目(2013-014)。

Automatic identification of new sentiment word about microblog based on word association

CHEN Xin1, WANG Suge1,2, LIAO Jian1   

  1. 1. School of Computer and Information Technology, Shanxi University, Taiyuan Shanxi 030006, China;
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education(Shanxi University), Taiyuan Shanxi 030006, China
  • Received:2015-08-29 Revised:2015-09-13 Online:2016-02-10 Published:2016-02-03

摘要: 针对微博中新情感词的识别问题,提出了一种基于词语相关度的微博新情感词自动识别方法。首先,对于分词软件把一个新词错分成几个词的问题,利用组合思想将相邻词进行合并作为新词的候选词;其次,为了充分利用词语上下文的语义信息,采用神经网络训练语料获得候选新词的空间表示向量;最后,利用已有的情感词典作为指导,融合基于词表集合的关联度排序和最大关联度排序算法,在候选词上筛选,获得最终的情感新词。在COAE2014(第六届中文倾向性分析评测)任务3语料上,提出的融合算法与点互信息(PMI)、增强互信息(EMI)、多词表达距离(MED)、新词语概率(NWP)以及基于词向量的新词识别方法相比,准确率至少提高了22%,说明该方法自动识别微博新情感词效果优于其他五种方法。

关键词: 情感词识别, 词语相关度, 词向量, 排序算法, 微博

Abstract: Aiming at new sentiment word identification, an automatic extraction of new words about microblog was proposed based on the word association. Firstly, a new word, which was incorrectly separated into several words using the Chinese auto-segmentation system, should be assembled as the candidate word. In addition, to make full use of the semantic information of word context, the spatial representation vector of the candidate words was obtained by training a neural network. Finally, using the existing emotional vocabulary as a guide, combining the association-sort algorithm based on vocabulary list and the max association-sort algorithm, the final new emotional word was selected from candidate words. The experimental results on the task No. 3 of COAE2014 show that the precision of the proposed method increases at least 22%, compared to Pointwise Mutual Information (PMI), Enhanced Mutual Information (EMI), Normalized Multi-word Expression Distance (NMED), New Word Probability (NWP), and identification of new sentiment word based on word embedding, which proves the effectiveness of the proposed method.

Key words: sentiment word recognition, word association, word vector, sort algorithm, microblog

中图分类号: