计算机应用 ›› 2016, Vol. 36 ›› Issue (8): 2076-2081.DOI: 10.11772/j.issn.1001-9081.2016.08.2076

• 第六届中国数据挖掘会议(CCDM 2016) • 上一篇    下一篇

基于多类别语义词簇的新闻读者情绪分类

温雯1, 吴彪1, 蔡瑞初1, 郝志峰1,2, 王丽娟1   

  1. 1. 广东工业大学 计算机学院, 广州 510006;
    2. 佛山科学技术学院 数据与大数据学院, 广东 佛山 528000
  • 收稿日期:2016-03-01 修回日期:2016-04-28 出版日期:2016-08-10 发布日期:2016-08-10
  • 通讯作者: 吴彪
  • 作者简介:温雯(1981-),女,江西赣州人,副教授,博士,CCF会员,主要研究方向:机器学习、模式识别、信息检索;吴彪(1991-),男,广东陆丰人,硕士研究生,主要研究方向:数据挖掘、模式识别;蔡瑞初(1983-),男,浙江温州人,教授,博士,CCF高级会员,主要研究方向:数据挖掘、机器学习、信息检索;郝志峰(1968-),男,江苏苏州人,教授,博士,CCF会员,主要研究方向:机器学习、人工智能;王丽娟(1978-),女,河北邢台人,讲师,博士,主要研究方向:机器学习、高维数据聚类分析。
  • 基金资助:
    国家自然科学基金资助项目(61202269,61472089)。

Emotion classification for news readers based on multi-category semantic word clusters

WEN Wen1, WU Biao1, CAI Ruichu1, HAO Zhifeng1,2, WANG Lijuan1   

  1. 1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou Guangdong 510006, China;
    2. School of Mathematics and Big Data, Foshan University, Foshan Guangdong 528000, China
  • Received:2016-03-01 Revised:2016-04-28 Online:2016-08-10 Published:2016-08-10
  • Supported by:
    This work is partly supported by the National Natural Science Foundation of China (61375059), the Specialized Research Fund for the Doctoral Program of Higher Education (20121103110031), the Beijing Municipal Education Research Plan Key Project (KZ201410005004), the Opening Project of State Key Laboratory of Digital Publishing Technology, Peking University Founder Group Corp.

摘要: 分析和研究文本读者情绪有助于发现互联网的负面信息,是舆情监控的重要组成部分。考虑到引起读者不同情绪主要因素在于文本的语义内容,如何抽取文本语义特征因此成为一个重要问题。针对这一问题,提出首先使用word2vec模型对文本进行初始的语义表达;在此基础上结合各个情绪类别分别构建有代表性的语义词簇,进而采用一定准则筛选对类别判断有效的词簇,从而将传统的文本词向量表达改进为语义词簇上的向量表达;最后使用多标签分类方法进行情绪标签的学习和分类。实验结果表明,该方法相对于现有的代表性方法来说能够获得更好的精度和稳定性。

关键词: 情感分析, 情绪分类, 语义词簇, 多标签学习, word2vec

Abstract: The analysis and study of readers' emotion is helpful to find negative information of the Internet, and it is an important part of public opinion monitoring. Taking into account the main factors that lead to the different emotions of readers is the semantic content of the text, how to extract semantic features of the text has become an important issue. To solve this problem, the initial features related to the semantic content of the text was expressed by word2vec model. On the basis of that, representative semantic word clusters were established for all emotion categories. Furthermore, a strategy was adopted to select the representative word clusters that are helpful for emotion classification, thus the traditional text word vector was transformed to the vector on semantic word clusters. Finally, the multi-label classification was implemented for the emotion label learning and classification. Experimental results demonstrate that the proposed method achieves better accuracy and stability compared with state-of-the-art methods.

Key words: sentiment analysis, emotion classification, semantic word cluster, multi-label learning, word2vec

中图分类号: