Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (S2): 1-8.DOI: 10.11772/j.issn.1001-9081.2023040428
• Artificial intelligence •
Yuemei XU1(), Ling HU1, Jiayi ZHAO1, Wanze DU2, Wenqing WANG2
Received:
2023-04-20
Revised:
2023-08-17
Accepted:
2023-08-21
Online:
2024-01-09
Published:
2023-12-31
Contact:
Yuemei XU
通讯作者:
徐月梅
作者简介:
徐月梅(1985—),女,广西梧州人,副教授,博士,主要研究方向:跨语言自然语言处理基金资助:
CLC Number:
Yuemei XU, Ling HU, Jiayi ZHAO, Wanze DU, Wenqing WANG. Research progress and enlightenment of large language models on multi-lingual intelligence[J]. Journal of Computer Applications, 2023, 43(S2): 1-8.
徐月梅, 胡玲, 赵佳艺, 杜宛泽, 王文清. 大语言模型与多语言智能的研究进展与启示[J]. 《计算机应用》唯一官方网站, 2023, 43(S2): 1-8.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023040428
模型 | 任务 | 适用语言 | 模型参数大小 | 优点 | 缺点 |
---|---|---|---|---|---|
Multi-BERT[ | 零次跨语言模式迁移 | 104种语言 | 1.1亿~3.4亿 | 在零样本跨语言任务中表现出色,尤其当源和目标相似时 | 在某些语言对的多语言表示上表现出系统性的缺陷 |
XLM[ | 预训练模型的 跨语言表征 | 超过100种 语言 | 2.7亿 | 利用平行语料引导模型表征对齐,提升预训练模型的跨语言表征性能 | 训练数据规模相对较小,尤其对于资源较少的语言 |
XLM-RoBERTa[ | 跨语言分类、 序列标注和问答 | 100种语言 | 5.5亿 | 使用大规模多语言预训练,在跨语言分类、序列标注和问答上表现出色 | 模型有大量的代码合成词,导致系统无法理解句子的内在含义 |
MetaXL[ | 元学习框架, 学习迁移语言的知识表示 | 所有语言 | — | 使目标语言和源语言在表达空间中更接近,具有良好的传输性能 | 尚未探索在预训练模型的多个层上放置多个转换网络 |
mT5[ | 将多种任务统一到文本到文本 框架 | 101种语言 | 3亿~130亿 | 将多语言的文本分类、阅读理解、摘要生成等任务统一到文本生成任务解决 | 本质上依然是Transformer架构的复用,并未进行结构的创新 |
ChatGPT[ | 通用大语言模型 | 95种语言 | 1 750亿 | 首个支持英语、中文、西班牙语、法语以及德语等95种语言的大语言模型 | 英语语境下的文本生成速度和质量都明显优于其他语言 |
ChatGLM[ | 通用大语言模型 | 中英文 | 60亿 | 支持中英文,中文语言的自然语言生成对话能力突出 | 以提升中文性能为主,支持语种数量不如其他通用大语言模型 |
BLOOM[ | 开放式多语言 语言模型 | 46种语言 | 1 760亿 | 支持英语、中文、越南语、加泰罗尼亚语等46种语言,对于大部分语言都是首个支持该语言的大语言模型 | 没能解决多语言模型的语言伦理偏见等问题 |
LLaMA[ | 通用大语言模型 | 20种语言 | 70亿~650亿 | 只使用公开可用的数据进行训练,相比ChatGPT,模型更高效、算力要求更低 | 没能解决多语言模型的语言伦理偏见等问题 |
模型 | 任务 | 适用语言 | 模型参数大小 | 优点 | 缺点 |
---|---|---|---|---|---|
Multi-BERT[ | 零次跨语言模式迁移 | 104种语言 | 1.1亿~3.4亿 | 在零样本跨语言任务中表现出色,尤其当源和目标相似时 | 在某些语言对的多语言表示上表现出系统性的缺陷 |
XLM[ | 预训练模型的 跨语言表征 | 超过100种 语言 | 2.7亿 | 利用平行语料引导模型表征对齐,提升预训练模型的跨语言表征性能 | 训练数据规模相对较小,尤其对于资源较少的语言 |
XLM-RoBERTa[ | 跨语言分类、 序列标注和问答 | 100种语言 | 5.5亿 | 使用大规模多语言预训练,在跨语言分类、序列标注和问答上表现出色 | 模型有大量的代码合成词,导致系统无法理解句子的内在含义 |
MetaXL[ | 元学习框架, 学习迁移语言的知识表示 | 所有语言 | — | 使目标语言和源语言在表达空间中更接近,具有良好的传输性能 | 尚未探索在预训练模型的多个层上放置多个转换网络 |
mT5[ | 将多种任务统一到文本到文本 框架 | 101种语言 | 3亿~130亿 | 将多语言的文本分类、阅读理解、摘要生成等任务统一到文本生成任务解决 | 本质上依然是Transformer架构的复用,并未进行结构的创新 |
ChatGPT[ | 通用大语言模型 | 95种语言 | 1 750亿 | 首个支持英语、中文、西班牙语、法语以及德语等95种语言的大语言模型 | 英语语境下的文本生成速度和质量都明显优于其他语言 |
ChatGLM[ | 通用大语言模型 | 中英文 | 60亿 | 支持中英文,中文语言的自然语言生成对话能力突出 | 以提升中文性能为主,支持语种数量不如其他通用大语言模型 |
BLOOM[ | 开放式多语言 语言模型 | 46种语言 | 1 760亿 | 支持英语、中文、越南语、加泰罗尼亚语等46种语言,对于大部分语言都是首个支持该语言的大语言模型 | 没能解决多语言模型的语言伦理偏见等问题 |
LLaMA[ | 通用大语言模型 | 20种语言 | 70亿~650亿 | 只使用公开可用的数据进行训练,相比ChatGPT,模型更高效、算力要求更低 | 没能解决多语言模型的语言伦理偏见等问题 |
1 | WEI J, TAY Y, BOMMASANI R, et al. Emergent abilities of large language models [EB/OL]. [2023-04-01].. |
2 | GOERTZEL B. Artificial general intelligence: concept, state of the art, and future prospects [J]. Journal of Artificial General Intelligence, 2014, 5(1): 1-48. 10.2478/jagi-2014-0001 |
3 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL].[2023-04-01]. . 10.3126/jiee.v3i1.34327 |
4 | PENNINGTON J, SOCHER R, MANNING C D. GloVe: Global vectors for word representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2014: 1532-1543. 10.3115/v1/d14-1162 |
5 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 2017 Conference on Advances in Neural Information Processing Systems. Long Beach: NIPS, 2017: 5998-6008. |
6 | DEVLIN J, CHANG M, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL. 2019: 4171-4186. 10.18653/v1/n18-2 |
7 | RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training [EB/OL]. [2023-04-01].. 10.4324/9781003267836-1 |
8 | RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners [EB/OL]. [2023-04-01].. |
9 | BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners [C]// Proceedings of the 34rd Conference on Neural Information Processing Systems. Long Beach: NIPS, 2020, 33: 1877-1901. |
10 | JIAO W, WANG W, HUANG J-T, et al. Is ChatGPT a good translator? Yes with GPT-4 as the engine [EB/OL]. [2023-04-02]. . |
11 | OpenAI. ChatGPT plugins [EB/OL]. [2023-04-03]. . |
12 | RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer [J]. Journal of Machine Learning Research, 2020, 21(1):5485-5551. |
13 | KIM Y. Convolutional neural networks for sentence classification [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2014: 1746-1751. 10.3115/v1/d14-1181 |
14 | YANG Z, DAI Z, YANG Y, et al. XLNet: generalized autoregressive pretraining for language understanding [C]// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: NIPS, 2019: 5754-5764. |
15 | LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. [2023-04-03]. . |
16 | LAN Z, CHEN M, GOODMAN S, et al. ALBERT: A lite BERT for self-supervised learning of language representations [EB/OL]. [2023-04-03]. . 10.1109/slt48900.2021.9383575 |
17 | CLARK K, LUONG M T, LE Q V, et al. ELECTRA: Pre-training text encoders as discriminators rather than generators [EB/OL]. [2023-04-03]. . 10.18653/v1/2020.emnlp-main.20 |
18 | OUYANG L, WU J, JIANG X, et al. Training language models to follow instructions with human feedback [EB/OL]. [2023-04-03]. . |
19 | THOPPILAN R, DE FREITAS D, HALL J, et al. LaMDA: language models for dialog applications [EB/OL]. [2023-04-03].. |
20 | CHOWDHERY A, NARANG S, DEVLIN J, et al. PaLM: scaling language modeling with pathways [EB/OL]. [2023-04-03]. . |
21 | ANIL R, DAI A M, FIRAT O, et al. PaLM 2 technical report [EB/OL]. [2023-04-03]. . |
22 | TOUVRON H, LAVRIL T, IZACARD G, et al. LLaMA: open and efficient foundation language models [EB/OL]. [2023-04-03]. . |
23 | CHIANG WL, LI Z, LIN Z, et al. Vicuna: an open-source chatbot impressing GPT-4 with 90%* ChatGPT quality [EB/OL]. [2023-04-03]. . |
24 | SMITH S, PATWARY M, NORICK B, et al. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, a large-scale generative language model [EB/OL]. [2023-04-03]. . |
25 | ZENG W, REN X, SU T, et al. PanGu‑α: large-scale autoregressive pretrained Chinese language models with auto-parallel computation [EB/OL]. [2023-04-03]. . |
26 | REN X, ZHOU P, MENG X, et al. PanGu‑Σ: towards trillion parameter language model with sparse heterogeneous computing [EB/OL]. [2023-04-03]. . |
27 | ZENG A, LIU X, DU Z, et al. GLM-130B: an open bilingual pre-trained model[EB/OL]. [2023-04-03]. . |
28 | PIRES J P, SCHLINGER E, GARRETTE D. How multi-lingual is multi-lingual BERT? [C]// Proceedings of the 57th Conference of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2019: 4996-5001. 10.18653/v1/p19-1493 |
29 | WU S, DREDZE M. Are all languages created equal in multi-lingual BERT? [C]// Proceedings of the 5th Workshop on Representation Learning for NLP. Stroudsburg, PA: ACL, 2020: 120-130. 10.18653/v1/2020.repl4nlp-1.16 |
30 | SCHWENK H, LI X. A corpus for multi-lingual document classification in eight languages [EB/OL]. [2023-04-03]. . |
31 | DONG X, DE MELO G. A robust self-learning framework for cross-lingual text classification [C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: ACL, 2019: 6306-6310. 10.18653/v1/d19-1658 |
32 | LAMPLE G, CONNEAU A. Cross-lingual language model pretraining [C]// Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver: NIPS, 2019: 7057-7067. 10.18653/v1/d18-1549 |
33 | CONNEAU A, KHANDELWAL K, GOYAL N, et al. Unsupervised cross-lingual representation learning at scale [C]// Proceedings of the 58th Conference of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020: 8440-8451. 10.18653/v1/2020.acl-main.747 |
34 | HOSSAIN E, SHARIF O, HOQUE M M. NLP-CUET@LT-EDI-EACL2021: multi-lingual code-mixed hope speech detection using cross-lingual representation learner [EB/OL]. [2023-04-03]. . |
35 | PAN S J, YANG Q. A survey on transfer learning [J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(10): 1345-1359. 10.1109/tkde.2009.191 |
36 | XIA M, ZHENG G, MUKHERJEE S, et al. MetaXL: meta representation transformation for low-resource cross-lingual learning [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Stroudsburg, PA: ACL, 2021:499-511. 10.18653/v1/2021.naacl-main.42 |
37 | XUE L, CONSTANT N, ROBERTS A, et al. mT5: a massively multi-lingual pre-trained text-to-text transformer[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA: ACL, 2021:483-498. 10.18653/v1/2021.naacl-main.41 |
38 | BUBECK S, CHANDRASEKARAN V, ELDAN R, et al. Sparks of artificial general intelligence: early experiments with GPT-4 [EB/OL]. [2023-04-03]. . 10.3390/a16090432 |
39 | SCAO TL, FAN A, AKIKI C, et al. BLOOM: a 176B-parameter open-access multi-lingual language model [EB/OL]. [2023-04-03]. . |
40 | FARRA N. Cross-lingual and low-resource sentiment analysis [D]. New York: Columbia University,2019: 1-266. |
41 | KURITA K, VYAS N, PAREEK A. Measuring bias in contextualized word representations [C]// Proceedings of the 1st Workshop on Gender Bias in Natural Language Processing. Stroudsburg, PA: ACL, 2019: 166-172. 10.18653/v1/w19-3823 |
42 | KANEKO M, BOLLEGALA D. Unmasking the mask-evaluating social biases in masked language models [C]// Proceedings of the 36th AAAI Conference on Aritificial Intelligence. Menlo Park, CA: AAAI Press, 2022: 11954-11962. 10.1609/aaai.v36i11.21453 |
43 | NADEEM M, BETHKE A, REDDY S. StereoSet: measuring stereotypical bias in pretrained language models [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, PA: ACL, 2021:5356-5371. 10.18653/v1/2021.acl-long.416 |
44 | BLODGETT S L, BAROCAS S, DAUMÉ Ⅲ H, et al. Language (technology) is power: a critical survey of “bias” in NLP [C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: ACL, 2020:5454-5476. 10.18653/v1/2020.acl-main.485 |
45 | BHARDWAJ R, MAJUMDER N, PORIA S. Investigating gender bias in BERT [J]. Cognitive Computation, 2021, 13(4): 1008-1018. 10.1007/s12559-021-09881-2 |
46 | DAVID R. The political biases of the ChatGPT [J]. Social Sciences. 2023, 12(3):148. 10.3390/socsci12030148 |
47 | 罗戈,张新鹏.聚焦ChatGPT:发展、影响与问题[J]. 自然杂志, 2023, 45(2): 106-108. 10.3969/j.issn.0253-9608.2023.02.004 |
48 | WONGSO W, LUCKY H, SUHARTONO D. Pre-trained transformer-based language models for Sundanese [J]. Journal of Big Data, 2022, 9(1): Article No.39. 10.1186/s40537-022-00590-7 |
49 | LIANG P, BOMMASANI R, LEE T, et al. Holistic evaluation of language models [EB/OL]. [2023-04-03]. . 10.1111/nyas.15007 |
50 | LIU C, JIN R, REN Y, et al. M3 KE: a massive multi-level multi-subject knowledge evaluation benchmark for Chinese large language models [EB/OL]. [2023-04-03]. . |
51 | SHEN Y, HEACOCK L, ELIAS J, al etl. ChatGPT and other large language models are double-edged swords [J]. Radiology. 2023, 307(2): Article No.23. 10.1148/radiol.230163 |
52 | WU S, DREDZE M. Beto, bentz, becas: the surprising cross-lingual effectiveness of BERT [C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA: ACL, 2019: 833-844. 10.18653/v1/d19-1077 |
53 | PFEIFFER J, VULI I, GUREVYCH I, et al. MAD-X: an adapter-based framework for multi-task cross-lingual transfer [C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: ACL, 2020: 7654-7673. 10.18653/v1/2020.emnlp-main.617 |
54 | REBUFFI S A, BILEN H, VEDALDI A. Learning multiple visual domains with residual adapters [C]// Proceedings of the 2017 Annual Conference on Neural Information Processing Systems. Long Beach, CA: NIPS, 2017: 506-516. 10.1109/cvpr.2018.00847 |
55 | BORNEA M, PAN L, ROSENTHAL S, et al. multi-lingual transfer learning for QA using translation as data augmentation [C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press. 2021: 12583-12591. 10.1609/aaai.v35i14.17491 |
56 | SUN T, SHAO Y, QIAN H, et al. Black-box tuning for language-model-as-a-service[EB/OL]. [2023-04-03]. . |
57 | BAI Y, KADAVATH S, KUNDU S, et al. Constitutional AI: harmlessness from AI feedback [EB/OL]. [2023-04-03]. . |
58 | JIAO W, WANG W, HUANG J, et al. Is ChatGPT a good translator? A preliminary study [EB/OL]. [2023-04-03]. . |
[1] | Kezheng CHEN, Xiaoran GUO, Yong ZHONG, Zhenping LI. Relation extraction method based on negative training and transfer learning [J]. Journal of Computer Applications, 2023, 43(8): 2426-2430. |
[2] | Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389. |
[3] | Bona XUAN, Jin LI, Yafei SONG, Zexuan MA. Malicious code classification method based on improved MobileNetV2 [J]. Journal of Computer Applications, 2023, 43(7): 2217-2225. |
[4] | Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning [J]. Journal of Computer Applications, 2023, 43(6): 1826-1832. |
[5] | Chuanbiao LI, Yuanwei BI. Stereo matching algorithm based on cross-domain adaptation [J]. Journal of Computer Applications, 2023, 43(10): 3230-3235. |
[6] | Ruijie YANG, Guilin ZHENG. Face liveness detection based on InceptionV3 and feature fusion [J]. Journal of Computer Applications, 2022, 42(7): 2037-2042. |
[7] | Ying CHEN, Jiong YU, Jiaying CHEN, Xusheng DU. Cross-layer data sharing based multi-task model [J]. Journal of Computer Applications, 2022, 42(5): 1447-1454. |
[8] | Mo LI, Tianliang LU, Ziheng XIE. Android malware family classification method based on code image integration [J]. Journal of Computer Applications, 2022, 42(5): 1490-1499. |
[9] | Zumin WANG, Zhihao ZHANG, Jing QIN, Changqing JI. Review of mechanical fault diagnosis technology based on convolutional neural network [J]. Journal of Computer Applications, 2022, 42(4): 1036-1043. |
[10] | Tiankai LIANG, Bi ZENG, Guang CHEN. Federated learning survey:concepts, technologies, applications and challenges [J]. Journal of Computer Applications, 2022, 42(12): 3651-3662. |
[11] | Xiayang SHI, Fengyuan ZHANG, Jiaqi YUAN, Min HUANG. Detection of unsupervised offensive speech based on multilingual BERT [J]. Journal of Computer Applications, 2022, 42(11): 3379-3385. |
[12] | Chenguang LI, Bo ZHANG, Qian ZHAO, Xiaoping CHEN, Xingfu WANG. Empathy prediction from texts based on transfer learning [J]. Journal of Computer Applications, 2022, 42(11): 3603-3609. |
[13] | Bin FAN, Zhi LI, Jian GAO. Deep robust watermarking algorithm based on multiscale knowledge learning [J]. Journal of Computer Applications, 2022, 42(10): 3102-3110. |
[14] | CHEN Zhengtao, HUANG Can, YANG Bo, ZHAO Li, LIAO Yong. Yak face recognition algorithm of parallel convolutional neural network based on transfer learning [J]. Journal of Computer Applications, 2021, 41(5): 1332-1336. |
[15] | WANG Jinkai, JIA Xu. Vein recognition algorithm based on Siamese nonnegative matrix factorization with transferability [J]. Journal of Computer Applications, 2021, 41(3): 898-903. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||