基于多类别语义词簇的新闻读者情绪分类

doi:10.11772/j.issn.1001-9081.2016.08.2076

计算机应用 ›› 2016, Vol. 36 ›› Issue (8): 2076-2081.DOI: 10.11772/j.issn.1001-9081.2016.08.2076

• 第六届中国数据挖掘会议(CCDM 2016) • 上一篇下一篇

基于多类别语义词簇的新闻读者情绪分类

温雯¹, 吴彪¹, 蔡瑞初¹, 郝志峰^1,2, 王丽娟¹

1. 广东工业大学计算机学院, 广州 510006;
2. 佛山科学技术学院数据与大数据学院, 广东佛山 528000

收稿日期:2016-03-01 修回日期:2016-04-28 出版日期:2016-08-10 发布日期:2016-08-10
通讯作者: 吴彪
作者简介:温雯(1981-),女,江西赣州人,副教授,博士,CCF会员,主要研究方向:机器学习、模式识别、信息检索;吴彪(1991-),男,广东陆丰人,硕士研究生,主要研究方向:数据挖掘、模式识别;蔡瑞初(1983-),男,浙江温州人,教授,博士,CCF高级会员,主要研究方向:数据挖掘、机器学习、信息检索;郝志峰(1968-),男,江苏苏州人,教授,博士,CCF会员,主要研究方向:机器学习、人工智能;王丽娟(1978-),女,河北邢台人,讲师,博士,主要研究方向:机器学习、高维数据聚类分析。
基金资助:
国家自然科学基金资助项目（61202269,61472089）。

Emotion classification for news readers based on multi-category semantic word clusters

WEN Wen¹, WU Biao¹, CAI Ruichu¹, HAO Zhifeng^1,2, WANG Lijuan¹

1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou Guangdong 510006, China;
2. School of Mathematics and Big Data, Foshan University, Foshan Guangdong 528000, China

Received:2016-03-01 Revised:2016-04-28 Online:2016-08-10 Published:2016-08-10
Supported by:
This work is partly supported by the National Natural Science Foundation of China (61375059), the Specialized Research Fund for the Doctoral Program of Higher Education (20121103110031), the Beijing Municipal Education Research Plan Key Project (KZ201410005004), the Opening Project of State Key Laboratory of Digital Publishing Technology, Peking University Founder Group Corp.

摘要/Abstract

摘要： 分析和研究文本读者情绪有助于发现互联网的负面信息，是舆情监控的重要组成部分。考虑到引起读者不同情绪主要因素在于文本的语义内容，如何抽取文本语义特征因此成为一个重要问题。针对这一问题，提出首先使用word2vec模型对文本进行初始的语义表达；在此基础上结合各个情绪类别分别构建有代表性的语义词簇，进而采用一定准则筛选对类别判断有效的词簇，从而将传统的文本词向量表达改进为语义词簇上的向量表达；最后使用多标签分类方法进行情绪标签的学习和分类。实验结果表明，该方法相对于现有的代表性方法来说能够获得更好的精度和稳定性。

关键词: 情感分析, 情绪分类, 语义词簇, 多标签学习, word2vec

Abstract: The analysis and study of readers' emotion is helpful to find negative information of the Internet, and it is an important part of public opinion monitoring. Taking into account the main factors that lead to the different emotions of readers is the semantic content of the text, how to extract semantic features of the text has become an important issue. To solve this problem, the initial features related to the semantic content of the text was expressed by word2vec model. On the basis of that, representative semantic word clusters were established for all emotion categories. Furthermore, a strategy was adopted to select the representative word clusters that are helpful for emotion classification, thus the traditional text word vector was transformed to the vector on semantic word clusters. Finally, the multi-label classification was implemented for the emotion label learning and classification. Experimental results demonstrate that the proposed method achieves better accuracy and stability compared with state-of-the-art methods.

Key words: sentiment analysis, emotion classification, semantic word cluster, multi-label learning, word2vec

中图分类号:

TP391

温雯, 吴彪, 蔡瑞初, 郝志峰, 王丽娟. 基于多类别语义词簇的新闻读者情绪分类[J]. 计算机应用, 2016, 36(8): 2076-2081.

WEN Wen, WU Biao, CAI Ruichu, HAO Zhifeng, WANG Lijuan. Emotion classification for news readers based on multi-category semantic word clusters[J]. Journal of Computer Applications, 2016, 36(8): 2076-2081.

参考文献

[1] 赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848.(ZHAO Y Y,QIN B,LIU T.Sentiment analysis[J].Journal of Software,2010,21(8):1834-1848.)
[2] HATZIVASSILOGLOU V,MCKEOWN K R.Predicting the semantic orientation of adjectives[C]//ACL'98:Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics.Stroudsburg,PA:Association for Computational Linguistics,1997:174-181.
[3] 周咏梅,阳爱民,杨佳能.一种新闻评论情感词典的构建方法[J].计算机科学,2014,41(8):67-69.(ZHOU Y M,YANG A M,YANG J N.Construction method of sentiment lexicon for new reviews[J].Computer Science,2014,41(8):67-69.)
[4] TANG D,QIN B,LIU T.Learning semantic representations of users and products for document level sentiment classification[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2015:1014-1023.
[5] TANG D,QIN B,LIU T.Document modeling with convolutional-gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics,2015:1422-1432.
[6] QUAN C,REN F.Sentence emotion analysis and recognition based on emotion words using Ren-CECps[J].International Journal of Advanced Intelligence Paradigms,2010,2(1):105-117.
[7] XU R,CHEN T,XIA Y,et al.Word embedding composition for data imbalances in sentiment and emotion classification[J].Cognitive Computation,2015,7(2):226-240.
[8] GUI L,YUAN L,XU R,et al.Emotion cause detection with linguistic construction in Chinese Weibo text[C]//NLPCC 2014:Proceedings of the Third CCF Conference on Natural Language Processing and Chinese Computing,Volume 496 of the series Communications in Computer and Information Science.Berlin:Springer-Verlag,2014:457-464.
[9] 叶璐.新闻文本的读者情绪自动预测方法研究[D].哈尔滨:哈尔滨工业大学,2012:35-43.(YE L.Research on emotion prediction of news articles from reader's perspective[D].Harbin:Harbin Institute of Technology,2012:35-43.)
[10] HURST M F,NIGAM K.Retrieving topical sentiments from online document collections[C]//Proceedings of SPIE 5296:Document Recognition and Retrieval Ⅺ.Bellingham,WA:SPIE,2004:27-34.
[11] 雷龙艳.中文微博细粒度情绪识别研究[D].衡阳:南华大学,2014:20-36.(LEI L Y.Research on fine-grained sentiment analysis base on Chinese micro-blog[D].Hengyang:University of South China,2014:20-36.)
[12] WANG S,LI D,WEI Y,et al.A feature selection method based on Fisher's discriminant ratio for text sentiment classification[C]//WISM 2009:Proceedings of the 2009 International Conference on Web Information Systems and Mining,LNCS 5854.Berlin:Springer-Verlag,2009:88-97.
[13] BENGIO Y,DUCHARME R,VINCENT P,et al.A neural probabilistic language model[J].The Journal of Machine Learning Research,2003,3:1137-1155.
[14] BLEI D M,NG A Y,JORDAN M I.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3:993-1022.
[15] 李芳,何婷婷,宋乐.评价主题挖掘及其倾向性识别[J].计算机科学,2012,39(6):159-162.(LI F,HE T T,SONG L.Opinion topic mining and orientation identification[J].Computer Science,2012,39(6):159-162.)
[16] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].ICLR Workshop,arXiv preprint arXiv:1301.3781,2013.
[17] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in Neural Information Processing Systems 26.Cambridge,MA:MIT Press,2013:3111-3119.
[18] 邓澍军,陆光明,夏龙.Deep Learning实战之word2vec[Z].网易有道,2014:16-17.(DENG S J,LU G M,XIA L.Deep learning practice of word2vec[Z].Youdao,2014:16-17.)
[19] ZHANG M-L,ZHOU Z-H.ML-KNN:a lazy learning approach to multi-label learning[J].Pattern Recognition,2007,40(7):2038-2048.
[20] CHENG W,HVLLERMEIER E.Combining instance-based learning and logistic regression for multilabel classification[J].Machine Learning,2009,76(2):211-225.

基于多类别语义词簇的新闻读者情绪分类

Emotion classification for news readers based on multi-category semantic word clusters

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	章荪, 尹春勇. 基于多任务学习的时序多模态情感分析模型[J]. 计算机应用, 2021, 41(6): 1631-1639.
[2]	赖雪梅, 唐宏, 陈虹羽, 李珊珊. 基于注意力机制的特征融合-双向门控循环单元多模态情感分析[J]. 计算机应用, 2021, 41(5): 1268-1274.
[3]	孟祥瑞, 杨文忠, 王婷. 基于图文融合的情感分析研究综述[J]. 计算机应用, 2021, 41(2): 307-317.
[4]	邱宁佳, 王晓霞, 王鹏, 王艳春. 融合语法规则的双通道中文情感模型分析[J]. 计算机应用, 2021, 41(2): 318-323.
[5]	张阳, 王小宁. 基于Word2Vec词嵌入和高维生物基因选择遗传算法的文本特征选择方法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3151-3155.
[6]	郭可心, 张宇翔. 基于多层次空间注意力的图文评论情感分析方法[J]. 计算机应用, 2021, 41(10): 2835-2841.
[7]	杨书新, 张楠. 融合情感词典与上下文语言模型的文本情感分析[J]. 计算机应用, 2021, 41(10): 2829-2834.
[8]	杨璐, 何明祥. 基于门控机制和卷积神经网络的中文文本情感分析模型[J]. 计算机应用, 2021, 41(10): 2842-2848.
[9]	朱思淼, 魏世伟, 魏思恒, 余敦辉. 基于弹幕情感分析和主题模型的视频推荐算法[J]. 计算机应用, 2021, 41(10): 2813-2819.
[10]	孙敏, 李旸, 庄正飞, 余大为. 基于并行混合网络融入注意力机制的情感分析[J]. 计算机应用, 2020, 40(9): 2543-2548.
[11]	杨云龙, 孙建强, 宋国超. 基于门控循环单元和胶囊特征的文本情感分析[J]. 计算机应用, 2020, 40(9): 2531-2535.
[12]	陈佳伟, 韩芳, 王直杰. 基于自注意力门控图卷积网络的特定目标情感分析[J]. 计算机应用, 2020, 40(8): 2202-2206.
[13]	曹建芳, 赵爱迪, 张自邦. 融合阈值寻优的卷积神经网络在图像标注中的应用[J]. 计算机应用, 2020, 40(6): 1587-1592.
[14]	赵亚欧, 张家重, 李贻斌, 付宪瑞, 生伟. 融合基于语言模型的词嵌入和多尺度卷积神经网络的情感分析[J]. 计算机应用, 2020, 40(3): 651-657.
[15]	王昆, 郑毅, 方书雅, 刘守印. 基于文本筛选和改进BERT的长文本方面级情感分析[J]. 计算机应用, 2020, 40(10): 2838-2844.