Journal of Computer Applications ›› 0, Vol. ›› Issue (): 18-23.DOI: 10.11772/j.issn.1001-9081.2024030297

• Artificial intelligence • Previous Articles     Next Articles

Cross-lingual knowledge transfer method based on alignment of representational space structures

Siyuan REN1,2, Cheng PENG1,2(), Ke CHEN1,2, Zhiyi HE1,2   

  1. 1.Chengdu Institute of Computer Application,Chinese Academy of Sciences,Chengdu Sichuan 610213,China
    2.University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2024-03-18 Revised:2024-04-01 Accepted:2024-04-07 Online:2025-01-24 Published:2024-12-31
  • Contact: Cheng PENG

基于表征空间结构对齐的跨语言知识迁移方法

任思远1,2, 彭程1,2(), 陈科1,2, 何智毅1,2   

  1. 1.中国科学院 成都计算机应用研究所,成都 610213
    2.中国科学院大学,北京 100049
  • 通讯作者: 彭程
  • 作者简介:任思远(1997—),男,山西晋城人,硕士研究生,主要研究方向:自然语言处理
    彭程(1976—),男,四川成都人,高级工程师,主要研究方向:软件工程、智能识别
    陈科(1996—),男,江苏徐州人,博士研究生,主要研究方向:自然语言处理
    何智毅(1996—),男,贵州凯里人,硕士研究生,主要研究方向:自然语言处理。

Abstract:

In the field of Natural Language Processing (NLP), as an efficient method for sentence representation learning, contrastive learning mitigates the anisotropy of Transformer-based pre-trained language models effectively and enhances the quality of sentence representations significantly. However, the existing research focuses on English conditions, especially under supervised settings. Due to the lack of labeled data, it is difficult to utilize contrastive learning effectively to obtain high-quality sentence representations in most non-English languages. To address this issue, a cross-lingual knowledge transfer method for contrastive learning models was proposed, transferring knowledge across languages by aligning the structures of different language representation spaces. Based on this, a simple and effective cross-lingual knowledge transfer framework, TransCSE, was developed to transfer the knowledge from supervised English contrastive learning models to non-English models. Through knowledge transfer experiments from English to six directions, including French, Arabic, Spanish, Turkish, and Chinese, knowledge was transferred successfully from the supervised contrastive learning model SimCSE (Simple contrastive learning of sentence embeddings) to the multilingual pre-trained language model mBERT (Multilingual Bidirectional Encoder Representations from Transformers) by TransCSE. Experimental results show that model trained using the TransCSE framework achieves accuracy improvements of 17.95 and 43.27 percentage points on XNLI (Cross-lingual Natural Language Inference) and STS (Semantic Textual Similarity) 2017 benchmark datasets, respectively, compared to the original mBERT, proving the effectiveness of TransCSE. Moreover, compared to cross-lingual knowledge transfer methods based on shared parameters and representation alignment, TransCSE has the best performance on both XNLI and STS 2017 benchmark datasets.

Key words: Natural Language Processing (NLP), contrastive learning, cross-lingual knowledge transfer, multilingual pre-trained model, alignment of representational space structures

摘要:

在自然语言处理(NLP)领域中,对比学习作为一种高效的句子表征学习方法,有效缓解了基于Transformer的预训练语言模型的各向异性,并显著提升了句子表征的质量。然而,现有研究集中在英语上,尤其是在有监督设置下的情况。由于缺乏有标签数据,在大多数非英语语言上难以有效利用对比学习获得高质量的句子表征。针对此问题,提出一种适用于对比学习模型的跨语言知识迁移方法——通过对齐不同语言表征空间的结构进行跨语言知识迁移,并基于此方法设计了一个简单有效的跨语言知识迁移框架——TransCSE,旨在将有监督英语对比学习模型的知识迁移到非英语模型上。通过英语到英语、法语、阿拉伯语、西班牙语、土耳其语、汉语等6个方向的知识迁移实验,TransCSE将有监督英语对比学习模型SimCSE(Simple Contrastive learning of Sentence Embeddings)的知识迁移到了多语言预训练语言模型mBERT(multilingual Bidirectional Encoder Representations from Transformers)上。实验结果表明,与原始的mBERT相比,利用TransCSE框架训练完成的模型在XNLI(Cross-lingual Natural Language Inference)和STS(Semantic Textual Similarity) 2017这2个基准数据集上分别获得了17.95和43.27个百分点的准确率提升,验证了TransCSE的有效性;同时,相较于基于共享参数和基于表征对齐的跨语言知识迁移方法,TransCSE在2个数据集上均取得了最佳表现。

关键词: 自然语言处理, 对比学习, 跨语言知识迁移, 多语言预训练模型, 表征空间结构对齐

CLC Number: