计算机应用 ›› 2021, Vol. 41 ›› Issue (10): 2829-2834.DOI: 10.11772/j.issn.1001-9081.2020121900

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

融合情感词典与上下文语言模型的文本情感分析

杨书新, 张楠   

  1. 江西理工大学 信息工程学院, 江西 赣州 341000
  • 收稿日期:2020-12-04 修回日期:2021-04-14 出版日期:2021-10-10 发布日期:2021-07-14
  • 通讯作者: 杨书新
  • 作者简介:杨书新(1978-),男,江西九江人,副教授,博士,CCF会员,主要研究方向:自然语言处理、图数据管理、生物信息学;张楠(1994-),男,宁夏中卫人,硕士研究生,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(61662028);江西省教育厅科学技术研究项目(GJJ170518);江西省研究生创新专项(YC2018-S331)。

Text sentiment analysis based on sentiment lexicon and context language model

YANG Shuxin, ZHANG Nan   

  1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou Jiangxi 341000, China
  • Received:2020-12-04 Revised:2021-04-14 Online:2021-10-10 Published:2021-07-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61662028), the Science and Technology Research Project of Education Department of Jiangxi Province (GJJ170518), the Special Funds for Postgraduate Innovation of Jiangxi Province (YC2018-S331).

摘要: 词嵌入技术在文本情感分析中发挥着重要的作用,但是传统的Word2Vec、GloVe等词嵌入技术会产生语义单一的问题。针对上述问题提出了一种融合情感词典与上下文语言模型ELMo的文本情感分析模型SLP-ELMo。首先,利用情感词典对句子中的单词进行筛选;其次,将筛选出的单词输入字符卷积神经网络(char-CNN),从而产生每个单词的字符向量;然后,将字符向量输入ELMo模型进行训练;此外,在ELMo向量的最后一层加入了注意力机制,以便更好地训练词向量;最后,将词向量与ELMo向量并行融合并输入分类器进行文本情感分类。与现有的多个模型对比,所提模型在IMDB和SST-2这两个数据集上均得到了更高的准确率,验证了模型的有效性。

关键词: 情感分析, ELMo, 情感词典, 卷积神经网络, 字符向量

Abstract: Word embedding technology plays an important role in text sentiment analysis, but the traditional word embedding technologies such as Word2Vec and GloVe (Global Vectors for word representation) will lead to the problem of single semantics. Aiming at the above problem, a text sentiment analysis model named Sentiment Lexicon Parallel-Embedding from Language Model (SLP-ELMo) based on sentiment lexicon and context language model named Embedding from Language Model (ELMo) was proposed. Firstly, the sentiment lexicon was used to filter the words in the sentence. Secondly, the filtered words were input into the character-level Convolutional Neural Network (char-CNN) to generate the character vector of each word. Then, the character vectors were input into ELMo model for training. In addition, the attention mechanism was added to the last layer of ELMo vector to train the word vectors better. Finally, the word vectors and ELMo vector were combined in parallel and input into the classifier for text sentiment classification. Compared with the existing models, the proposed model achieves higher accuracy on IMDB and SST-2 datasets, which validates the effectiveness of the model.

Key words: sentiment analysis, Embedding from Language Model (ELMo), sentiment lexicon, Convolutional Neural Network (CNN), character vector

中图分类号: