Research progress and enlightenment of large language models on multi-lingual intelligence

doi:10.11772/j.issn.1001-9081.2023040428

Abstract

Abstract:

In view of the fact that the Large Language Model （LLM） performs well on high-resource languages but poorly on low-resource languages， a comprehensive analysis was conducted on the research status， techniques， and limitations of LLMs in multi-lingual scenarios. Firstly， representative language models such as Multi-BERT （multi-lingual Bidirectional Encoder Representations from Transformer）， GPT （Generative Pre-trained Transformer） and ChatGPT （Chat Generative Pre-trained Transformer） since 2018 were reviewed to trace the development of LLMs. Then， a detailed analysis of LLM in multi-lingual intelligence was conducted， summarizing the research limitations and improvement directions of LLM in multi-lingual intelligence. Finally， the future application scenarios of multi-lingual intelligence for LLM were discussed. The analysis indicates that existing LLMs are limited by imbalanced multi-lingual training data， so that they have ethical biases across different languages， suffer from monotonous language style， lack evaluation benchmarks for multi-lingual capabilities， and suffer from hallucination problem. To enhance multi-lingual performance of LLM， future developments will rely on joint training within the same language family， multi-lingual adapter technology， cross-lingual transfer learning technology， prompt engineering technology and reinforcement learning technology based on artificial intelligence feedback.

Key words: large language model, multi-lingual intelligence, cross-lingual model, artificial general intelligence, transfer learning

摘要：

针对大语言模型（LLM）在高资源语言上表现优异而在低资源语言上表现欠佳的现状，深入分析LLM在多语言场景下的研究现状、技术与局限。首先，从2018年至今以Multi-BERT（multi-lingual Bidirectional Encoder Representations from Transformers）、GPT（Generative Pre-trained Transformer）和ChatGPT（Chat Generative Pre-trained Transformer）等语言模型为代表，综述LLM的发展脉络；然后，具体分析了大语言模型在多语言智能上的探索，总结现有LLM在多语言智能的研究局限及其改进方向；最后，探讨LLM未来的多语言智能应用场景。分析指出现有LLM受限于多语言训练语料不均衡，存在语言文化的伦理偏见、语言模型的风格趋同化、多语言能力评估基准缺乏以及多语言场景下的模型幻象输出等问题，未来可采用同一语系家族语言的联合训练、多语言适配器技术、跨语言迁移学习技术、提示语工程技术、基于人工智能反馈的强化学习技术等策略实现多语言智能的LLM。

关键词: 大语言模型, 多语言智能, 跨语言模型, 通用人工智能, 迁移学习

CLC Number:

TP399

Yuemei XU, Ling HU, Jiayi ZHAO, Wanze DU, Wenqing WANG. Research progress and enlightenment of large language models on multi-lingual intelligence[J]. Journal of Computer Applications, 2023, 43(S2): 1-8.

徐月梅, 胡玲, 赵佳艺, 杜宛泽, 王文清. 大语言模型与多语言智能的研究进展与启示[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 1-8.

Figures/Tables 3

References 58

1	WEI J， TAY Y， BOMMASANI R， et al. Emergent abilities of large language models ［EB/OL］. ［2023-04-01］..
2	GOERTZEL B. Artificial general intelligence： concept， state of the art， and future prospects ［J］. Journal of Artificial General Intelligence， 2014， 5（1）： 1-48. 10.2478/jagi-2014-0001
3	MIKOLOV T， CHEN K， CORRADO G， et al. Efficient estimation of word representations in vector space ［EB/OL］.［2023-04-01］. . 10.3126/jiee.v3i1.34327
4	PENNINGTON J， SOCHER R， MANNING C D. GloVe： Global vectors for word representation ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1532-1543. 10.3115/v1/d14-1162
5	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 2017 Conference on Advances in Neural Information Processing Systems. Long Beach： NIPS， 2017： 5998-6008.
6	DEVLIN J， CHANG M， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： ACL. 2019： 4171-4186. 10.18653/v1/n18-2
7	RADFORD A， NARASIMHAN K， SALIMANS T， et al. Improving language understanding by generative pre-training ［EB/OL］. ［2023-04-01］.. 10.4324/9781003267836-1
8	RADFORD A， WU J， CHILD R， et al. Language models are unsupervised multitask learners ［EB/OL］. ［2023-04-01］..
9	BROWN T， MANN B， RYDER N， et al. Language models are few-shot learners ［C］// Proceedings of the 34rd Conference on Neural Information Processing Systems. Long Beach： NIPS， 2020， 33： 1877-1901.
10	JIAO W， WANG W， HUANG J-T， et al. Is ChatGPT a good translator？ Yes with GPT-4 as the engine ［EB/OL］. ［2023-04-02］. .
11	OpenAI. ChatGPT plugins ［EB/OL］. ［2023-04-03］. .
12	RAFFEL C， SHAZEER N， ROBERTS A， et al. Exploring the limits of transfer learning with a unified text-to-text transformer ［J］. Journal of Machine Learning Research， 2020， 21（1）：5485-5551.
13	KIM Y. Convolutional neural networks for sentence classification ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1746-1751. 10.3115/v1/d14-1181
14	YANG Z， DAI Z， YANG Y， et al. XLNet： generalized autoregressive pretraining for language understanding ［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver： NIPS， 2019： 5754-5764.
15	LIU Y， OTT M， GOYAL N， et al. RoBERTa： a robustly optimized BERT pretraining approach ［EB/OL］. ［2023-04-03］. .
16	LAN Z， CHEN M， GOODMAN S， et al. ALBERT： A lite BERT for self-supervised learning of language representations ［EB/OL］. ［2023-04-03］. . 10.1109/slt48900.2021.9383575
17	CLARK K， LUONG M T， LE Q V， et al. ELECTRA： Pre-training text encoders as discriminators rather than generators ［EB/OL］. ［2023-04-03］. . 10.18653/v1/2020.emnlp-main.20
18	OUYANG L， WU J， JIANG X， et al. Training language models to follow instructions with human feedback ［EB/OL］. ［2023-04-03］. .
19	THOPPILAN R， DE FREITAS D， HALL J， et al. LaMDA： language models for dialog applications ［EB/OL］. ［2023-04-03］..
20	CHOWDHERY A， NARANG S， DEVLIN J， et al. PaLM： scaling language modeling with pathways ［EB/OL］. ［2023-04-03］. .
21	ANIL R， DAI A M， FIRAT O， et al. PaLM 2 technical report ［EB/OL］. ［2023-04-03］. .
22	TOUVRON H， LAVRIL T， IZACARD G， et al. LLaMA： open and efficient foundation language models ［EB/OL］. ［2023-04-03］. .
23	CHIANG WL， LI Z， LIN Z， et al. Vicuna： an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality ［EB/OL］. ［2023-04-03］. .
24	SMITH S， PATWARY M， NORICK B， et al. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B， a large-scale generative language model ［EB/OL］. ［2023-04-03］. .
25	ZENG W， REN X， SU T， et al. PanGu‑α： large-scale autoregressive pretrained Chinese language models with auto-parallel computation ［EB/OL］. ［2023-04-03］. .
26	REN X， ZHOU P， MENG X， et al. PanGu‑Σ： towards trillion parameter language model with sparse heterogeneous computing ［EB/OL］. ［2023-04-03］. .
27	ZENG A， LIU X， DU Z， et al. GLM-130B： an open bilingual pre-trained model［EB/OL］. ［2023-04-03］. .
28	PIRES J P， SCHLINGER E， GARRETTE D. How multi-lingual is multi-lingual BERT？［C］// Proceedings of the 57th Conference of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2019： 4996-5001. 10.18653/v1/p19-1493
29	WU S， DREDZE M. Are all languages created equal in multi-lingual BERT？［C］// Proceedings of the 5th Workshop on Representation Learning for NLP. Stroudsburg， PA： ACL， 2020： 120-130. 10.18653/v1/2020.repl4nlp-1.16
30	SCHWENK H， LI X. A corpus for multi-lingual document classification in eight languages ［EB/OL］. ［2023-04-03］. .
31	DONG X， DE MELO G. A robust self-learning framework for cross-lingual text classification ［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2019： 6306-6310. 10.18653/v1/d19-1658
32	LAMPLE G， CONNEAU A. Cross-lingual language model pretraining ［C］// Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver： NIPS， 2019： 7057-7067. 10.18653/v1/d18-1549
33	CONNEAU A， KHANDELWAL K， GOYAL N， et al. Unsupervised cross-lingual representation learning at scale ［C］// Proceedings of the 58th Conference of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020： 8440-8451. 10.18653/v1/2020.acl-main.747
34	HOSSAIN E， SHARIF O， HOQUE M M. NLP-CUET@LT-EDI-EACL2021： multi-lingual code-mixed hope speech detection using cross-lingual representation learner ［EB/OL］. ［2023-04-03］. .
35	PAN S J， YANG Q. A survey on transfer learning ［J］. IEEE Transactions on Knowledge and Data Engineering， 2009， 22（10）： 1345-1359. 10.1109/tkde.2009.191
36	XIA M， ZHENG G， MUKHERJEE S， et al. MetaXL： meta representation transformation for low-resource cross-lingual learning ［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies.Stroudsburg， PA： ACL， 2021：499-511. 10.18653/v1/2021.naacl-main.42
37	XUE L， CONSTANT N， ROBERTS A， et al. mT5： a massively multi-lingual pre-trained text-to-text transformer［C］// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsburg， PA： ACL， 2021：483-498. 10.18653/v1/2021.naacl-main.41
38	BUBECK S， CHANDRASEKARAN V， ELDAN R， et al. Sparks of artificial general intelligence： early experiments with GPT-4 ［EB/OL］. ［2023-04-03］. . 10.3390/a16090432
39	SCAO TL， FAN A， AKIKI C， et al. BLOOM： a 176B-parameter open-access multi-lingual language model ［EB/OL］. ［2023-04-03］. .
40	FARRA N. Cross-lingual and low-resource sentiment analysis ［D］. New York： Columbia University，2019： 1-266.
41	KURITA K， VYAS N， PAREEK A. Measuring bias in contextualized word representations ［C］// Proceedings of the 1st Workshop on Gender Bias in Natural Language Processing. Stroudsburg， PA： ACL， 2019： 166-172. 10.18653/v1/w19-3823
42	KANEKO M， BOLLEGALA D. Unmasking the mask-evaluating social biases in masked language models ［C］// Proceedings of the 36th AAAI Conference on Aritificial Intelligence. Menlo Park， CA： AAAI Press， 2022： 11954-11962. 10.1609/aaai.v36i11.21453
43	NADEEM M， BETHKE A， REDDY S. StereoSet： measuring stereotypical bias in pretrained language models ［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2021：5356-5371. 10.18653/v1/2021.acl-long.416
44	BLODGETT S L， BAROCAS S， DAUMÉ Ⅲ H， et al. Language （technology） is power： a critical survey of “bias” in NLP ［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020：5454-5476. 10.18653/v1/2020.acl-main.485
45	BHARDWAJ R， MAJUMDER N， PORIA S. Investigating gender bias in BERT ［J］. Cognitive Computation， 2021， 13（4）： 1008-1018. 10.1007/s12559-021-09881-2
46	DAVID R. The political biases of the ChatGPT ［J］. Social Sciences. 2023， 12（3）：148. 10.3390/socsci12030148
47	罗戈，张新鹏.聚焦ChatGPT：发展、影响与问题［J］. 自然杂志， 2023， 45（2）： 106-108. 10.3969/j.issn.0253-9608.2023.02.004
48	WONGSO W， LUCKY H， SUHARTONO D. Pre-trained transformer-based language models for Sundanese ［J］. Journal of Big Data， 2022， 9（1）： Article No.39. 10.1186/s40537-022-00590-7
49	LIANG P， BOMMASANI R， LEE T， et al. Holistic evaluation of language models ［EB/OL］. ［2023-04-03］. . 10.1111/nyas.15007
50	LIU C， JIN R， REN Y， et al. M3 KE： a massive multi-level multi-subject knowledge evaluation benchmark for Chinese large language models ［EB/OL］. ［2023-04-03］. .
51	SHEN Y， HEACOCK L， ELIAS J， al etl. ChatGPT and other large language models are double-edged swords ［J］. Radiology. 2023， 307（2）： Article No.23. 10.1148/radiol.230163
52	WU S， DREDZE M. Beto， bentz， becas： the surprising cross-lingual effectiveness of BERT ［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2019： 833-844. 10.18653/v1/d19-1077
53	PFEIFFER J， VULI I， GUREVYCH I， et al. MAD-X： an adapter-based framework for multi-task cross-lingual transfer ［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2020： 7654-7673. 10.18653/v1/2020.emnlp-main.617
54	REBUFFI S A， BILEN H， VEDALDI A. Learning multiple visual domains with residual adapters ［C］// Proceedings of the 2017 Annual Conference on Neural Information Processing Systems. Long Beach， CA： NIPS， 2017： 506-516. 10.1109/cvpr.2018.00847
55	BORNEA M， PAN L， ROSENTHAL S， et al. multi-lingual transfer learning for QA using translation as data augmentation ［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Menlo Park， CA： AAAI Press. 2021： 12583-12591. 10.1609/aaai.v35i14.17491
56	SUN T， SHAO Y， QIAN H， et al. Black-box tuning for language-model-as-a-service［EB/OL］. ［2023-04-03］. .
57	BAI Y， KADAVATH S， KUNDU S， et al. Constitutional AI： harmlessness from AI feedback ［EB/OL］. ［2023-04-03］. .
58	JIAO W， WANG W， HUANG J， et al. Is ChatGPT a good translator？ A preliminary study ［EB/OL］. ［2023-04-03］. .

模型	任务	适用语言	模型参数大小	优点	缺点
Multi-BERT^［28］	零次跨语言模式迁移	104种语言	1.1亿~3.4亿	在零样本跨语言任务中表现出色，尤其当源和目标相似时	在某些语言对的多语言表示上表现出系统性的缺陷
XLM^［32］	预训练模型的跨语言表征	超过100种语言	2.7亿	利用平行语料引导模型表征对齐，提升预训练模型的跨语言表征性能	训练数据规模相对较小，尤其对于资源较少的语言
XLM-RoBERTa^［33］	跨语言分类、序列标注和问答	100种语言	5.5亿	使用大规模多语言预训练，在跨语言分类、序列标注和问答上表现出色	模型有大量的代码合成词，导致系统无法理解句子的内在含义
MetaXL^［36］	元学习框架，学习迁移语言的知识表示	所有语言	—	使目标语言和源语言在表达空间中更接近，具有良好的传输性能	尚未探索在预训练模型的多个层上放置多个转换网络
mT5^［37］	将多种任务统一到文本到文本框架	101种语言	3亿~130亿	将多语言的文本分类、阅读理解、摘要生成等任务统一到文本生成任务解决	本质上依然是Transformer架构的复用，并未进行结构的创新
ChatGPT^［11］	通用大语言模型	95种语言	1 750亿	首个支持英语、中文、西班牙语、法语以及德语等95种语言的大语言模型	英语语境下的文本生成速度和质量都明显优于其他语言
ChatGLM^［27］	通用大语言模型	中英文	60亿	支持中英文，中文语言的自然语言生成对话能力突出	以提升中文性能为主，支持语种数量不如其他通用大语言模型
BLOOM^［39］	开放式多语言语言模型	46种语言	1 760亿	支持英语、中文、越南语、加泰罗尼亚语等46种语言，对于大部分语言都是首个支持该语言的大语言模型	没能解决多语言模型的语言伦理偏见等问题
LLaMA^［22］	通用大语言模型	20种语言	70亿~650亿	只使用公开可用的数据进行训练，相比ChatGPT，模型更高效、算力要求更低	没能解决多语言模型的语言伦理偏见等问题

模型	任务	适用语言	模型参数大小	优点	缺点
Multi-BERT^［28］	零次跨语言模式迁移	104种语言	1.1亿~3.4亿	在零样本跨语言任务中表现出色，尤其当源和目标相似时	在某些语言对的多语言表示上表现出系统性的缺陷
XLM^［32］	预训练模型的跨语言表征	超过100种语言	2.7亿	利用平行语料引导模型表征对齐，提升预训练模型的跨语言表征性能	训练数据规模相对较小，尤其对于资源较少的语言
XLM-RoBERTa^［33］	跨语言分类、序列标注和问答	100种语言	5.5亿	使用大规模多语言预训练，在跨语言分类、序列标注和问答上表现出色	模型有大量的代码合成词，导致系统无法理解句子的内在含义
MetaXL^［36］	元学习框架，学习迁移语言的知识表示	所有语言	—	使目标语言和源语言在表达空间中更接近，具有良好的传输性能	尚未探索在预训练模型的多个层上放置多个转换网络
mT5^［37］	将多种任务统一到文本到文本框架	101种语言	3亿~130亿	将多语言的文本分类、阅读理解、摘要生成等任务统一到文本生成任务解决	本质上依然是Transformer架构的复用，并未进行结构的创新
ChatGPT^［11］	通用大语言模型	95种语言	1 750亿	首个支持英语、中文、西班牙语、法语以及德语等95种语言的大语言模型	英语语境下的文本生成速度和质量都明显优于其他语言
ChatGLM^［27］	通用大语言模型	中英文	60亿	支持中英文，中文语言的自然语言生成对话能力突出	以提升中文性能为主，支持语种数量不如其他通用大语言模型
BLOOM^［39］	开放式多语言语言模型	46种语言	1 760亿	支持英语、中文、越南语、加泰罗尼亚语等46种语言，对于大部分语言都是首个支持该语言的大语言模型	没能解决多语言模型的语言伦理偏见等问题
LLaMA^［22］	通用大语言模型	20种语言	70亿~650亿	只使用公开可用的数据进行训练，相比ChatGPT，模型更高效、算力要求更低	没能解决多语言模型的语言伦理偏见等问题

[1]	Kezheng CHEN, Xiaoran GUO, Yong ZHONG, Zhenping LI. Relation extraction method based on negative training and transfer learning [J]. Journal of Computer Applications, 2023, 43(8): 2426-2430.
[2]	Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389.
[3]	Bona XUAN, Jin LI, Yafei SONG, Zexuan MA. Malicious code classification method based on improved MobileNetV2 [J]. Journal of Computer Applications, 2023, 43(7): 2217-2225.
[4]	Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning [J]. Journal of Computer Applications, 2023, 43(6): 1826-1832.
[5]	Chuanbiao LI, Yuanwei BI. Stereo matching algorithm based on cross-domain adaptation [J]. Journal of Computer Applications, 2023, 43(10): 3230-3235.
[6]	Ruijie YANG, Guilin ZHENG. Face liveness detection based on InceptionV3 and feature fusion [J]. Journal of Computer Applications, 2022, 42(7): 2037-2042.
[7]	Ying CHEN, Jiong YU, Jiaying CHEN, Xusheng DU. Cross-layer data sharing based multi-task model [J]. Journal of Computer Applications, 2022, 42(5): 1447-1454.
[8]	Mo LI, Tianliang LU, Ziheng XIE. Android malware family classification method based on code image integration [J]. Journal of Computer Applications, 2022, 42(5): 1490-1499.
[9]	Zumin WANG, Zhihao ZHANG, Jing QIN, Changqing JI. Review of mechanical fault diagnosis technology based on convolutional neural network [J]. Journal of Computer Applications, 2022, 42(4): 1036-1043.
[10]	Tiankai LIANG, Bi ZENG, Guang CHEN. Federated learning survey：concepts， technologies， applications and challenges [J]. Journal of Computer Applications, 2022, 42(12): 3651-3662.
[11]	Xiayang SHI, Fengyuan ZHANG, Jiaqi YUAN, Min HUANG. Detection of unsupervised offensive speech based on multilingual BERT [J]. Journal of Computer Applications, 2022, 42(11): 3379-3385.
[12]	Chenguang LI, Bo ZHANG, Qian ZHAO, Xiaoping CHEN, Xingfu WANG. Empathy prediction from texts based on transfer learning [J]. Journal of Computer Applications, 2022, 42(11): 3603-3609.
[13]	Bin FAN, Zhi LI, Jian GAO. Deep robust watermarking algorithm based on multiscale knowledge learning [J]. Journal of Computer Applications, 2022, 42(10): 3102-3110.
[14]	CHEN Zhengtao, HUANG Can, YANG Bo, ZHAO Li, LIAO Yong. Yak face recognition algorithm of parallel convolutional neural network based on transfer learning [J]. Journal of Computer Applications, 2021, 41(5): 1332-1336.
[15]	WANG Jinkai, JIA Xu. Vein recognition algorithm based on Siamese nonnegative matrix factorization with transferability [J]. Journal of Computer Applications, 2021, 41(3): 898-903.