Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3700-3707.DOI: 10.11772/j.issn.1001-9081.2021101779

• Artificial intelligence • Previous Articles    

Aspect-level cross-domain sentiment analysis based on capsule network

Jiana MENG1, Pin LYU1, Yuhai YU1(), Shichang SUN1, Hongfei LIN2   

  1. 1.College of Computer Science and Engineering,Dalian Minzu University,Dalian Liaoning 116600,China
    2.School of Computer Science and Technology,Dalian University of Technology,Dalian Liaoning 116024,China
  • Received:2021-10-18 Revised:2021-12-29 Accepted:2022-01-14 Online:2022-01-24 Published:2022-12-10
  • Contact: Yuhai YU
  • About author:MENG Jiana,born in 1972, Ph. D., professor. Her research interests include machine learning, text mining.
    LYU Pin,born in 1996, M. S. candidate. Her research interests include sentiment tendency analysis.
    SUN Shichang, born in 1979, Ph. D., associate professor. His research interests include transfer learning.
    LIN Hongfei,born in 1962, Ph. D., professor. His research interests include text mining, information retrieval.
  • Supported by:
    National Natural Science Foundation of China(61876031);2019 Scientific Research Funded Project of Liaoning Provincial Department of Education(LJYT201906)

基于胶囊网络的方面级跨领域情感分析

孟佳娜1, 吕品1, 于玉海1(), 孙世昶1, 林鸿飞2   

  1. 1.大连民族大学 计算机科学与工程学院,辽宁 大连 116600
    2.大连理工大学 计算机科学与技术学院,辽宁 大连 116024
  • 通讯作者: 于玉海
  • 作者简介:孟佳娜(1972—),女,吉林四平人,教授,博士,CCF会员,主要研究方向:机器学习、文本挖掘
    吕品(1996—),女,内蒙古呼和浩特人,硕士研究生,CCF会员,主要研究方向:情感倾向性分析
    孙世昶(1979—),男,辽宁大连人,副教授,博士,CCF会员,主要研究方向:迁移学习
    林鸿飞(1962—),男,吉林辽源人,教授,博士,CCF会员,主要研究方向:文本挖掘、信息检索。
  • 基金资助:
    国家自然科学基金资助项目(61876031);辽宁省教育厅2019年度科学研究经费项目(LJYT201906)

Abstract:

In the cross-domain sentiment analysis, the labeled samples in the target domain are seriously insufficient, the distributions of features in different domains are very different, and the emotional polarities expressed by features in one domain differ a lot from the emotional polarities in another domain, all of these problems lead to low classification accuracy. To deal with the above problems, an aspect-level cross-domain sentiment analysis method based on capsule network was proposed. Firstly, the feature representations of text were obtained by BERT (Bidirectional Encoder Representation from Transformers) pre-training model. Secondly, for the fine-grained aspect-level sentiment features, Recurrent Neural Network (RNN) was used to fuse the context features and aspect features. Thirdly, capsule network and dynamic routing were used to distinguish overlapping features, and the sentiment classification model was constructed on the basis of capsule network. Finally, a small amount of data in the target domain was used to fine-tune the model to realize cross-domain transfer learning. The optimal F1 score of the proposed method is 95.7% on Chinese dataset and 91.8% on English dataset, which effectively solves the low accuracy problem of insufficient training samples.

Key words: aspect-level sentiment analysis, cross-domain, capsule network, Recurrent Neural Network (RNN), pre-training

摘要:

在跨领域情感分析任务中,目标领域带标签样本严重不足,并且不同领域间的特征分布差异较大,特征所表达的情感极性也有很大差别,这些问题都导致了分类准确率较低。针对以上问题,提出一种基于胶囊网络的方面级跨领域情感分析方法。首先,通过BERT预训练模型获取文本的特征表示;其次,针对细粒度的方面级情感特征,采用循环神经网络(RNN)将上下文特征与方面特征进行融合;然后,使用胶囊网络配合动态路由来区分重叠特征,并构建基于胶囊网络的情感分类模型;最后,利用目标领域的少量数据对模型进行微调来实现跨领域迁移学习。所提方法在中文数据集上的最优的F1值达到95.7%,英文数据集上的最优的F1值达到了91.8%,有效解决了训练样本不足造成的准确率低的问题。

关键词: 方面级情感分析, 跨领域, 胶囊网络, 循环神经网络, 预训练

CLC Number: