Cross-lingual knowledge transfer method based on alignment of representational space structures

doi:10.11772/j.issn.1001-9081.2024030297

Abstract

Abstract:

In the field of Natural Language Processing （NLP）， as an efficient method for sentence representation learning， contrastive learning mitigates the anisotropy of Transformer-based pre-trained language models effectively and enhances the quality of sentence representations significantly. However， the existing research focuses on English conditions， especially under supervised settings. Due to the lack of labeled data， it is difficult to utilize contrastive learning effectively to obtain high-quality sentence representations in most non-English languages. To address this issue， a cross-lingual knowledge transfer method for contrastive learning models was proposed， transferring knowledge across languages by aligning the structures of different language representation spaces. Based on this， a simple and effective cross-lingual knowledge transfer framework， TransCSE， was developed to transfer the knowledge from supervised English contrastive learning models to non-English models. Through knowledge transfer experiments from English to six directions， including French， Arabic， Spanish， Turkish， and Chinese， knowledge was transferred successfully from the supervised contrastive learning model SimCSE （Simple contrastive learning of sentence embeddings） to the multilingual pre-trained language model mBERT （Multilingual Bidirectional Encoder Representations from Transformers） by TransCSE. Experimental results show that model trained using the TransCSE framework achieves accuracy improvements of 17.95 and 43.27 percentage points on XNLI （Cross-lingual Natural Language Inference） and STS （Semantic Textual Similarity） 2017 benchmark datasets， respectively， compared to the original mBERT， proving the effectiveness of TransCSE. Moreover， compared to cross-lingual knowledge transfer methods based on shared parameters and representation alignment， TransCSE has the best performance on both XNLI and STS 2017 benchmark datasets.

Key words: Natural Language Processing (NLP), contrastive learning, cross-lingual knowledge transfer, multilingual pre-trained model, alignment of representational space structures

摘要：

在自然语言处理（NLP）领域中，对比学习作为一种高效的句子表征学习方法，有效缓解了基于Transformer的预训练语言模型的各向异性，并显著提升了句子表征的质量。然而，现有研究集中在英语上，尤其是在有监督设置下的情况。由于缺乏有标签数据，在大多数非英语语言上难以有效利用对比学习获得高质量的句子表征。针对此问题，提出一种适用于对比学习模型的跨语言知识迁移方法——通过对齐不同语言表征空间的结构进行跨语言知识迁移，并基于此方法设计了一个简单有效的跨语言知识迁移框架——TransCSE，旨在将有监督英语对比学习模型的知识迁移到非英语模型上。通过英语到英语、法语、阿拉伯语、西班牙语、土耳其语、汉语等6个方向的知识迁移实验，TransCSE将有监督英语对比学习模型SimCSE（Simple Contrastive learning of Sentence Embeddings）的知识迁移到了多语言预训练语言模型mBERT（multilingual Bidirectional Encoder Representations from Transformers）上。实验结果表明，与原始的mBERT相比，利用TransCSE框架训练完成的模型在XNLI（Cross-lingual Natural Language Inference）和STS（Semantic Textual Similarity） 2017这2个基准数据集上分别获得了17.95和43.27个百分点的准确率提升，验证了TransCSE的有效性；同时，相较于基于共享参数和基于表征对齐的跨语言知识迁移方法，TransCSE在2个数据集上均取得了最佳表现。

关键词: 自然语言处理, 对比学习, 跨语言知识迁移, 多语言预训练模型, 表征空间结构对齐

CLC Number:

TP183

Siyuan REN, Cheng PENG, Ke CHEN, Zhiyi HE. Cross-lingual knowledge transfer method based on alignment of representational space structures[J]. Journal of Computer Applications, 0, (): 18-23.

任思远, 彭程, 陈科, 何智毅. 基于表征空间结构对齐的跨语言知识迁移方法[J]. 《计算机应用》唯一官方网站, 0, (): 18-23.

Figures/Tables 4

References 35

1	NOORALAHZADEH F， BEKOULIS G， BJERVA J， et al. Zero-shot cross-lingual transfer with meta learning ［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2020： 4547-4562.
2	GAO T， YAO X， CHEN D. SimCSE： simple contrastive learning of sentence embeddings ［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2021： 6894-6910.
3	BOWMAN S R， ANGELI G， POTTS C， et al. A large annotated corpus for learning natural language inference ［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2015： 632-642.
4	WILLIAMS A， NANGIA N， BOWMAN S. A broad-coverage challenge corpus for sentence understanding through inference ［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg： ACL， 2018： 1112-1122.
5	CONNEAU A， RINOTT R， LAMPLE G， et al. XNLI： evaluating cross-lingual sentence representations ［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2018：2475-2485.
6	PIKULIAK M， ŠIMKO M， BIELIKOVÁ M. Cross-lingual learning for text processing： a survey ［J］. Expert Systems with Applications， 2021， 165： No.113765.
7	Research Google. Multilingual BERT ［EB/OL］. ［2024-01-31］. .
8	CONNEAU A， KHANDELWAL K， GOYAL N， et al. Unsupervised cross-lingual representation learning at scale ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2020： 8440-8451.
9	FENG X， FENG X， QIN B， et al. Improving low resource named entity recognition using cross-lingual knowledge transfer ［C］// Proceedings of the 27th International Joint Conference on Artificial Intelligence. California： IJCAI.org， 2018： 4071-4077.
10	REIMERS N， GUREVYCH I. Making monolingual sentence embeddings multilingual using knowledge distillation ［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2020： 4512-4525.
11	PARK W， KIM D， LU Y， et al. Relational knowledge distillation ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 3962-3971.
12	CER D， DIAB M， AGIRRE E， et al. SemEval-2017 Task 1： semantic textual similarity multilingual and cross-lingual focused evaluation ［C］// Proceedings of the 11th International Workshop on Semantic Evaluation. Stroudsburg： ACL， 2017： 1-14.
13	GAO J， HE D， TAN X， et al. Representation degeneration problem in training natural language generation models ［EB/OL］. ［2024-01-31］..
14	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional Transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186.
15	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
16	XU L， XIE H， LI Z， et al. Contrastive learning models for sentence representations ［J］. ACM Transactions on Intelligent Systems and Technology， 2023， 14（4）： No.67.
17	SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting ［J］. Journal of Machine Learning Research， 2014， 15： 1929-1958.
18	YAN Y， LI R， WANG S， et al. ConSERT： a contrastive framework for self-supervised sentence representation transfer ［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 5065-5075.
19	WU X， GAO C， ZANG L， et al. ESimCSE： enhanced sample building method for contrastive learning of unsupervised sentence embedding［C］// Proceedings of the 29th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2022： 3898-3907.
20	REIMERS N， GUREVYCH I. Sentence-BERT： sentence embeddings using Siamese BERT-networks ［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg： ACL， 2019： 3982-3992.
21	HUANG J T， LI J， YU D， et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers ［C］// Proceedings of the 2013 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2013： 7304-7308.
22	ARORA V， LAHIRI A， REET H. Attribute based shared hidden layers for cross-language knowledge transfer ［C］// Proceedings of the 2016 IEEE Spoken Language Technology Workshop. Piscataway： IEEE， 2016： 617-623.
23	WANG Y， WU A， NEUBIG G. English contrastive learning can learn universal cross-lingual sentence embeddings ［C］// Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2022：9122-9133.
24	CHOUSA K， NAGATA M， NISHINO M， et al. SpanAlign： sentence alignment method based on cross-language span prediction and ILP［C］// Proceedings of the 28th International Conference on Computational Linguistics. ［S.l.］： International Committee on Computational Linguistics， 2020： 4750-4761.
25	WANG L， ZHAO W， LIU J. Aligning cross-lingual sentence representations with dual momentum contrast ［C］// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2021： 3807-3815.
26	CONNEAU A， LAMPLE G. Cross-lingual language model pretraining［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2019： 7059-7069.
27	CHI Z， DONG L， WEI F， et al. InfoXLM： an information-theoretic framework for cross-lingual language model pre-training ［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg： ACL， 2021： 3576-3588.
28	ARTETXE M， SCHWENK H. Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond ［J］. Transactions of the Association for Computational Linguistics， 2019， 7： 597-610.
29	HUANG H， LIANG Y， DUAN N， et al. Unicoder： a universal language encoder by pre-training with multiple cross-lingual tasks ［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg： ACL， 2019： 2485-2494.
30	GIORGI J， NITSKI O， WANG B， et al. DeCLUTR： deep contrastive learning for unsupervised textual representations ［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg： ACL， 2021： 879-895.
31	PANG B， LEE L. Seeing stars： exploiting class relationships for sentiment categorization with respect to rating scales ［C］// Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg： ACL， 2005： 115-124.
32	WANG S， MANNING C D. Baselines and bigrams： simple， good sentiment and topic classification ［C］// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics （Volume 2： Short Papers）. Stroudsburg： ACL， 2012： 90-94.
33	VOORHEES E M， TICE D M. Building a question answering test collection ［C］// Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2000： 200-207.
34	CONNEAU A， KIELA D. SentEval： an evaluation toolkit for universal sentence representations ［C］// Proceedings of the 11th International Conference on Language Resources and Evaluation. Paris： European Language Resources Association， 2018： 1699-1704.
35	EL-KISHKY A， CHAUDHARY V， GUZMÁN F， et al. CCAligned： a massive collection of cross-lingual Web-document pairs ［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2020： 5960-5969.

模型	MR	MPQA	TREC	平均
SimCSE	89.19	89.60	88.20	89.08
SimCSE_rotate	89.06	89.65	88.40	89.04
SimCSE_add	87.84	88.97	81.20	86.00
DeCLUTR	89.70	87.76	90.20	89.22
DeCLUTR_rotate	88.16	87.73	90.40	88.76
DeCLUTR_add	70.68	74.90	62.80	69.46

模型	MR	MPQA	TREC	平均
SimCSE	89.19	89.60	88.20	89.08
SimCSE_rotate	89.06	89.65	88.40	89.04
SimCSE_add	87.84	88.97	81.20	86.00
DeCLUTR	89.70	87.76	90.20	89.22
DeCLUTR_rotate	88.16	87.73	90.40	88.76
DeCLUTR_add	70.68	74.90	62.80	69.46

模型	en	ar	es	zh	tr	fr	平均
mBERT	51.70	46.37	52.02	45.15	48.24	50.82	49.05
Align	66.41	61.18	63.11	62.95	62.22	63.25	63.19
mSimCSE	69.54	59.14	66.05	64.49	59.26	63.61	63.68
TransCSE_/DA	70.94	61.98	68.82	64.99	64.71	66.93	66.40
TransCSE	71.28	62.50	69.50	65.73	65.11	67.88	67.00

模型	en	ar	es	zh	tr	fr	平均
mBERT	51.70	46.37	52.02	45.15	48.24	50.82	49.05
Align	66.41	61.18	63.11	62.95	62.22	63.25	63.19
mSimCSE	69.54	59.14	66.05	64.49	59.26	63.61	63.68
TransCSE_/DA	70.94	61.98	68.82	64.99	64.71	66.93	66.40
TransCSE	71.28	62.50	69.50	65.73	65.11	67.88	67.00

模型	en	ar	es	平均
mBERT	37.03	41.42	33.85	37.43
Align	82.13	60.96	76.73	73.27
mSimCSE	86.35	65.12	83.63	78.03
TransCSE_/DA	85.57	68.77	85.69	80.01
TransCSE	86.03	70.11	85.97	80.70