《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (4): 1108-1115.DOI: 10.11772/j.issn.1001-9081.2021071180

• CCF第36届中国计算机应用大会 (CCF NCCA 2021) • 上一篇    

结合广义自回归预训练语言模型与循环卷积神经网络的文本情感分析方法

潘列1, 曾诚1,2,3(), 张海丰1, 温超东1, 郝儒松1, 何鹏1,2,3   

  1. 1.湖北大学 计算机与信息工程学院,武汉 430062
    2.湖北省软件工程工程技术研究中心(湖北大学),武汉 430062
    3.智慧政务与人工智能应用湖北省工程研究中心(湖北大学),武汉 430062
  • 收稿日期:2021-07-08 修回日期:2021-08-27 接受日期:2021-08-31 发布日期:2021-09-08 出版日期:2022-04-10
  • 通讯作者: 曾诚
  • 作者简介:潘列(1997—),男,湖北黄冈人,硕士研究生,主要研究方向:自然语言处理、文本分类
    张海丰(1990—),男,湖北黄冈人,硕士研究生,主要研究方向:自然语言处理、文本分类
    温超东(1996—),男,湖北荆州人,硕士研究生,CCF会员,主要研究方向:自然语言处理、文本分类
    郝儒松(1996—),男,河南开封人,硕士研究生,主要研究方向:自然语言处理、文本分类
    何鹏(1988—),男,湖北武汉人,教授,博士,主要研究方向:人工智能、推荐系统。
  • 基金资助:
    国家自然科学基金资助项目(61977021)

Text sentiment analysis method combining generalized autoregressive pre-training language model and recurrent convolutional neural network

Lie PAN1, Cheng ZENG1,2,3(), Haifeng ZHANG1, Chaodong WEN1, Rusong HAO1, Peng HE1,2,3   

  1. 1.School of Computer Science and Information Engineering,Hubei University,Wuhan Hubei 430062,China
    2.Engineering and Technical Research Center of Hubei Province in Software Engineering (Hubei University),Wuhan Hubei 430062,China
    3.Engineering Research Center of Hubei Province in Intelligent Government Affairs and Application of Artificial Intelligence (Hubei University),Wuhan Hubei 430062,China
  • Received:2021-07-08 Revised:2021-08-27 Accepted:2021-08-31 Online:2021-09-08 Published:2022-04-10
  • Contact: Cheng ZENG
  • About author:PAN Lie, born in 1997, M. S. candidate. His research interests include natural language processing, text classification.
    ZHANG Haifeng, born in 1990, M. S. candidate. His research interests include natural language processing, text classification.
    WEN Chaodong, born in 1996, M. S. candidate. His research interests include natural language processing, text classification.
    HAO Rusong, born in 1996, M. S. candidate. His research interests include natural language processing, text classification.
    HE Peng, born in 1988, Ph. D. , professor. His research interests include artificial intelligence, recommender system.
  • Supported by:
    National Natural Science Foundation of China(61977021)

摘要:

传统的机器学习方法在对网络评论文本进行情感极性分类时,未能充分挖掘语义信息和关联信息,而已有的深度学习方法虽能提取语义信息和上下文信息,但该过程往往是单向的,在获取评论文本的深层语义信息过程中存在不足。针对以上问题,提出了一种结合广义自回归预训练语言模型(XLNet)与循环卷积神经网络(RCNN)的文本情感分析方法。首先,利用XLNet对文本进行特征表示,并通过引入片段级递归机制和相对位置信息编码,充分利用了评论文本的语境信息,从而有效提升了文本特征的表达能力;然后,利用RCNN对文本特征进行双向训练,并在更深层次上提取文本的上下文语义信息,从而提升了在情感分析任务中的综合性能。所提方法分别在三个公开数据集weibo-100k、waimai-10k和ChnSentiCorp上进行了实验,准确率分别达到了96.4%、91.8%和92.9%。实验结果证明了所提方法在情感分析任务中的有效性。

关键词: 评论文本, 情感分析, XLNet, 片段级递归机制, 循环卷积神经网络

Abstract:

Traditional machine learning methods fail to fully dig out semantic information and association information when classifying the sentiment polarity of online comment text. Although the existing deep learning methods can extract the semantic information and contextual information, the process is often one-way and there are some deficiencies in the process of obtaining the deep semantic information of comment text. Aiming at the above problems, a text sentiment analysis method was proposed by combining generalized autoregressive pretraining for language understanding model (XLNet) and RCNN (Recurrent Convolutional Neural Network). Firstly, XLNet was used to represent the text features. And by introducing the segment-level recurrence mechanism and relative position information encoding, the contextual information of comment text was fully considered, thereby improving the expression ability of text features effectively. Then, RCNN was used to train the text features in both directions and extract the context semantic information of the text at a deeper level, thereby improving the comprehensive performance in the sentiment analysis task. The experiments with the proposed method were carried out on three public datasets weibo-100k, waimai-10k and ChnSentiCorp. The results show that the accuracy reaches 96.4%, 91.8% and 92.9% respectively, which proves the effectiveness of the proposed method in the sentiment analysis task.

Key words: comment text, sentiment analysis, XLNet, segment-level recurrence mechanism, Recurrent Convolutional Neural Network (RCNN)

中图分类号: