Journal of Computer Applications

    Next Articles

Aspect-level sentiment triplet extraction based on graph convolutional network and cross-domain data enhancement

  

  • Received:2025-07-21 Revised:2025-10-15 Online:2025-11-05 Published:2025-11-05

基于图卷积网络和跨领域数据增强的方面级情感三元组抽取

陈木生,付文庆,邱晓红,吴俊华,温强   

  1. 江西理工大学
  • 通讯作者: 付文庆

Abstract: Aspect-level sentiment triplet extraction identifies sentiment entities, attributes, and polarities, enabling precise analysis of sentiment associations in user opinions. This supports product and service optimization, public opinion monitoring, and consumer decision-making. To address limitations of traditional methods in modeling long-range semantic dependencies and the scarcity of target-domain data, a cross-domain data augmentation graph convolutional neural network model was developed. During augmentation, pseudo-labeling based on maximum mean discrepancy was applied to annotate unlabeled target-domain data, mitigating domain shift. A domain-adaptive language model was trained to capture target-domain-specific semantics, and autoregressive data generation was performed to further expand the labeled data volume and diversity. During extraction, the augmented data was used to fine-tune the graph convolutional network-based language model from the pseudo-labeling process, which was then employed for sentiment triplet extraction. Experimental results on the ASTE-DATA-V2 dataset show that the proposed method outperforms benchmark approaches such as BGCA (Bidirectional Generative Cross-domain ABSA), FOAL (Fine-grained cOntrAstive Learning), and HiPM-hard (Hybrid Prompts Mixture). Compared with the baseline BGCA, the average F1 score increases by 0.82 percentage points (0.63-0.94 percentage points), improving the accuracy and stability of cross-domain sentiment triplet extraction.

Key words: Graph Convolutional Network (, GCN)

摘要: 方面级情感三元组提取通过识别情感主体、属性及其极性,能够精准解析用户观点中的情感关联,为产品服务优化、舆情监控和消费决策提供有力支持。针对传统方法难以建模长距离语义依赖、目标领域数据不足等问题,提出一种跨领域数据增强的图卷积神经网络模型用于方面级情感三元组提取。在跨领域数据增强阶段,通过引入最大均值差异的伪标注生成技术对目标域未标注数据进行初步标注以缓解领域漂移,训练领域自适应语言模型以捕捉目标领域特有语义,基于该领域自适应语言模型进行自回归数据生成,进一步扩充目标域的标注数据量和多样性;在情感三元组抽取阶段,利用第一阶段增强后的数据,对伪标注生成过程中产生的基于图卷积网络的语言模型进行微调,最后利用微调后的图卷积网络语言模型进行情感三元组抽取。在ASTE-DATA-V2数据集上的实验结果表明,该方法优于BGCA(Bidirectional Generative Cross-domain ABSA)、FOAL(Fine-grained cOntrAstive Learning)和HiPM-hard(Hybrid Prompts Mixture)等基准方法,相较基线BGCA的平均F1分数提高了0.82个百分点(0.63~0.94个百分点),提高了跨领域情感三元组提取的准确性与稳定性。