Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (4): 1056-1061.DOI: 10.11772/j.issn.1001-9081.2022030469

• Artificial intelligence • Previous Articles    

Fine-grained emotion classification of Chinese microblog based on syntactic dependency graph

Cheng FANG1(), Bei LI2, Ping HAN1, Qiong WU3   

  1. 1.College of Electronic Information and Automation,Civil Aviation University of China,Tianjin 300300,China
    2.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300300,China
    3.Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2022-04-13 Revised:2022-09-27 Accepted:2022-09-28 Online:2023-04-11 Published:2023-04-10
  • Contact: Cheng FANG
  • About author:LI Bei, born in 1993, M. S. candidate. Her research interests include natural language processing.
    HAN Ping, born in 1966, Ph. D., professor. Her research interests include signal and information processing, Synthetic Aperture Radar (SAR) object detection and recognition.
    WU Qiong, born in 1981, Ph. D. Her research interests include internet tendency analysis, big data mining.
  • Supported by:
    Civil Aviation Safety Capacity Building Fund of CAAC(14002500000019J014)

基于语法依存图的中文微博细粒度情感分类

方澄1(), 李贝2, 韩萍1, 吴琼3   

  1. 1.中国民航大学 电子信息与自动化学院, 天津 300300
    2.中国民航大学 安全科学与工程学院, 天津 300300
    3.中国科学院 计算技术研究所, 北京 100190
  • 通讯作者: 方澄
  • 作者简介:李贝(1993—),女,四川绵阳人,硕士研究生,主要研究方向:自然语言处理;
    韩萍(1966—),女,天津人,教授,博士,主要研究方向:信号与信息处理、合成孔径雷达(SAR)目标检测与识别;
    吴琼(1981—),女,北京人,博士,主要研究方向:互联网倾向性分析、大数据挖掘。
  • 基金资助:
    中国民用航空局安全能力建设资金资助项目(14002500000019J014)

Abstract:

Emotion analysis can quickly and accurately dig out users’ emotional tendencies, and has a huge application market. Aiming at the complexity and diversity of the microblog language’s syntactic structures, a Syntax Graph Convolution Network (SGCN) model was proposed for fine-grained emotion classification of Chinese microblog. The proposed model has the characteristics of rich structural and semantic expression at the same time. In the model, a text graph was constructed on the basis of the dependency between words, and the correlation degree between words was quantified by Pointwise Mutual Information (PMI). After that, the PMI was used as the weight of the corresponding edge to represent the structural information of the sentence. The semantic features fusing location information were taken as the initial features of nodes to increase the semantic features of nodes in the text graph. Experimental results on the microblog emotion classification dataset of Social Media Processing 2020 (SMP2020) show that for two sets of microblog data containing six categories of emotions: happiness, sadness, anger, fear, surprise, and emotionlessness, the average F1-score of the proposed model reaches 72.64% which is 2.75 and 3.87 percentage points higher than those of the BERT (Bidirectional Encoder Representations from Transformers) Graph Convolutional Network (BGCN) model and the Text Level Graph Neural Network (Text-Level-GNN) model, verifying that the proposed model can use the structural information of sentences more effectively to improve the classification performance than other deep learning models.

Key words: microblog, emotion analysis, Graph Convolutional Network (GCN), text graph, deep learning

摘要:

情感分析能从用户言论中快速准确地挖掘用户的情感倾向,有着极大的应用市场。针对微博语言语法结构复杂多样的特性,提出了一种基于语法依存结构的图卷积神经网络(SGCN)模型对中文微博进行细粒度的情感分类。所提模型兼具结构表达和语义表达丰富的特点:基于词语间的依赖关系构建文本图,并通过点互信息(PMI)量化词语间的相关程度,作为相应边的权重以充分表现句子的结构信息;将融合位置信息的语义特征作为节点的初始特征,增加文本图中点的语义特征。为了验证所提模型的性能,在SMP2020(Social Media Processing 2020)微博情感分类数据集上,对两组包含开心、悲伤、愤怒、恐惧、惊讶和无情绪的6类微博情感数据进行了分析。实验结果表明,所提模型的平均F1分数可达到72.64%,相较于BERT(Bidirectional Encoder Representations from Transformers)词向量特征图卷积网络(BGCN)模型和文本级图神经网络(Text-Level-GNN)模型分别提高了2.75和3.87个百分点,验证了所提模型能更有效地利用句子的结构信息,提升模型的分类性能。

关键词: 微博, 情感分析, 图卷积网络, 文本图, 深度学习

CLC Number: