《计算机应用》唯一官方网站 ›› 2021, Vol. 41 ›› Issue (5): 1236-1246.DOI: 10.11772/j.issn.1001-9081.2020081152
收稿日期:
2020-08-03
修回日期:
2020-11-15
发布日期:
2020-12-09
出版日期:
2021-05-10
通讯作者:
刘睿珩
作者简介:
刘睿珩(1997-),男,陕西蓝田人,硕士研究生,主要研究方向:自然语言处理、数据分析;叶霞(1977-),女,江苏六合人,副教授,博士,主要研究方向:数据库、计算机网络;岳增营(1991-),男,山东济宁人,硕士研究生,主要研究方向:自然语言处理、数据挖掘。
基金资助:
LIU Ruiheng, YE Xia, YUE Zengying
Received:
2020-08-03
Revised:
2020-11-15
Online:
2020-12-09
Published:
2021-05-10
Supported by:
摘要: 近年来,深度学习技术得到了快速发展。在自然语言处理(NLP)任务中,随着文本表征技术从词级上升到了文档级,利用大规模语料库进行无监督预训练的方式已被证明能够有效提高模型在下游任务中的性能。首先,根据文本特征提取技术的发展,从词级和文档级对典型的模型进行了分析;其次,从预训练目标任务和下游应用两个阶段,分析了当前预训练模型的研究现状,并对代表性的模型特点进行了梳理和归纳;最后,总结了当前预训练模型发展所面临的主要挑战并提出了对未来的展望。
中图分类号:
刘睿珩, 叶霞, 岳增营. 面向自然语言处理任务的预训练模型综述[J]. 计算机应用, 2021, 41(5): 1236-1246.
LIU Ruiheng, YE Xia, YUE Zengying. Review of pre-trained models for natural language processing tasks[J]. Journal of Computer Applications, 2021, 41(5): 1236-1246.
[1] PAN S J,YANG Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1345-1359. [2] MIKOLOV T,CHEN K,CORRADO G,et al. Efficient estimation of word representations in vector space[EB/OL].[2019-10-09]. https://arxiv.org/pdf/1301.3781.pdf. [3] PENNINGTON J,SOCHER R,MANNING C D. GloVe:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2014:1532-1543. [4] PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2018:2227-2237. [5] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010. [6] DEVLIN J,CHANG M W,LEE K,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics,2019:4171-4186. [7] RADFORD A,NARASIMHAN K,SALIMANS T,et al. Improving language understanding by generative pre-training[EB/OL].[2019-05-06]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf. [8] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Journal of Machine Learning Research,2003,3:1137-1155. [9] MIKOLOV T. Statistical language models based on neural networks[EB/OL].[2019-08-06]. http://www.fit.vutbr.cz/~imikolov/rnnlm/google.pdf. [10] MIKOLOV T,KARAFIÁT M,BURGET L,et al. Recurrent neural network based language model[C/OL]//Proceedings of the 11th Annual Conference of the International Speech Communication Association.[2019-08-06]. https://isca-speech.org/archive/archive_papers/interspeech_2010/i10_1045.pdf. [11] COLLOBERT R,WESTON J. A unified architecture for natural language processing:deep neural networks with multitask learning[C]//Proceedings of the 25th International Conference on Machine Learning. New York:ACM,2008:160-167. [12] MORIN F,BENGIO Y. Hierarchical probabilistic neural network language model[EB/OL].[2019-09-16]. https://www.iro.umontreal.ca/~lisa/pointeurs/hierarchical-nnlm-aistats05.pdf. [13] MNIH A, KAVUKCUOGLU K. Learning word embeddings efficiently with noise-contrastive estimation[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc., 2013:2265-2273. [14] JOULIN A,GRAVE E,BOJANOWSKI P,et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2017:427-431. [15] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2014:1746-1751. [16] CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2014:1724-1734. [17] BAHDANAU D,CHO K,BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2019-12-03]. https://arxiv.org/pdf/1409.0473.pdf. [18] DAI Z,YANG Z,YANG Y,et al. Transformer-XL:attentive language models beyond a fixed-length context[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2019:2978-2988. [19] KITAEV N,KAISER L,LEVSKAYA A. Reformer:the efficient transformer[EB/OL].[2020-05-04]. https://arxiv.org/pdf/2001.04451.pdf. [20] GOMEZ A N,REN M,URTASUN R,et al. The reversible residual network:backpropagation without storing activations[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc.,2017:2211-2221. [21] BELTAGY I,PETERS M E,COHAN A. Longformer:the longdocument transformer[EB/OL].[2020-12-08]. https://arxiv.org/pdf/2004.05150.pdf. [22] WANG A,SINGH A,MICHAEL J,et al. GLUE:a multi-task benchmark and analysis platform for natural language understanding[C]//Proceedings of the 2018 EMNLP Workshop BlackboxNLP:Analyzing and Interpreting Neural Networks for NLP. Stroudsburg, PA:Association for Computational Linguistics,2018:353-355. [23] HAO Y,DONG L,WEI F,et al. Visualizing and understanding the effectiveness of BERT[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing/the 9th International Joint Conference on Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics,2019:4143-4152. [24] YANG Z, DAI Z, YANG Y, et al. XLNet:generalized autoregressive pretraining for language understanding[C/OL]//Proceedings of the 33rd Conference on Neural Information Processing Systems.[2020-04-08]. https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf. [25] HOWARD J,RUDER S. Universal language model fine-tuning for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics,2018:328-339. [26] MERITY S, KESKAR N S, SOCHER R. Regularizing and optimizing LSTM language models[EB/OL].[2019-12-13]. https://arxiv.org/pdf/1708.02182.pdf. [27] CHRONOPOULOU A, BAZIOTIS C, POTAMIANOS A. An embarrassingly simple approach for transfer learning from pretrained language models[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2019:2089-2095. [28] RADFORD A,WU J,CHILD R,et al. Language models are unsupervised multitask learners[EB/OL].[2020-12-05]. https://d4mucfpksywv.cloudfront.net/better-language-models/languagemodels.pdf. [29] BROWN T B,MANN B,RYDER N,et al. Language models are few-shot learners[C/OL]//Proceedings of the 34th Conference on Neural Information Processing Systems.[2020-12-04]. https://papers.nips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf. [30] SONG K,TAN X,QIN T,et al. MASS:masked sequence to sequence pre-training for language generation[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR. org,2019:5926-5936. [31] RAFFEL C,SHAZEER N,ROBERTS A,et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. Journal of Machine Learning Research,2020,21:1-67. [32] LEWIS M,LIU Y,GOYAL N,et al. BART:denoising sequenceto-sequence pre-training for natural language generation, translation, and comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2020:7871-7880. [33] LAN Z,CHEN M,GOODMAN S,et al. ALBERT:a lite BERT for self-supervised learning of language representations[EB/OL].[2020-04-06]. https://arxiv.org/pdf/1909.11942.pdf. [34] JOSHI M, CHEN D, LIU Y, et al. SpanBERT:improving pretraining by representing and predicting spans[J]. Transactions of the Association for Computational Linguistics,2020,8:64-77. [35] LIU Y, OTT M, GOYAL N, et al. RoBERTa:a robustly optimized BERT pretraining approach[EB/OL].[2020-03-20]. https://arxiv.org/pdf/1907.11692.pdf. [36] XIAO D,ZHANG H,LI Y,et al. ERNIE-GEN:an enhanced multi-flow pre-training and fine-tuning framework for natural language generation[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2020:3997-4003. [37] QI W,YAN Y,GONG Y,et al. ProphetNet:predicting future ngram for sequence-to-sequence pre-training[M]//COHN T,HE Y, LIU Y. Findings of the Association for Computational Linguistics:EMNLP 2020. Stroudsburg, PA:Association for Computational Linguistics,2020:2401-2410. [38] RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD:100000+ questions for machine comprehension of text[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2016:2383-2392. [39] DONG L,YANG N,WANG W,et al. Unified language model pre-training for natural language understanding and generation[C/OL]//Proceedings of the 33rd Conference on Neural Information Processing Systems.[2019-12-19]. https://papers.nips.cc/paper/2019/file/c20bb2d9a50d5ac1f713f8b34d9aac5a-Paper.pdf. [40] BAO H,DONG L,WEI F,et al. UniLMv2:pseudo-masked language models for unified language model pre-training[C]//Proceedings of the 37th International Conference on Machine Learning. New York:JMLR. org,2020:642-652. [41] SONG K,TAN X,QIN T,et al. MPNet:masked and permuted pre-training for language understanding[C/OL]//Proceedings of the 34th Conference on Neural Information Processing Systems.[2020-12-11]. https://proceedings.neurips.cc/paper/2020/file/c3a690be93aa602ee2dc0ccab5b7b67e-Paper.pdf. [42] CLARK K,LUONG M T,LE Q V,et al. ELECTRA:pre-training text encoders as discriminators rather than generators[EB/OL].[2020-06-02]. https://arxiv.org/pdf/2003.10555.pdf. [43] RUDER S,PLANK B. Learning to select data for transfer learning with Bayesian optimization[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2017:372-382. [44] PETERS M E,RUDER S,SMITH N A. To tune or not to tune? Adapting pretrained representations to diverse tasks[C]//Proceedings of the 4th Workshop on Representation Learning for NLP. Stroudsburg, PA:Association for Computational Linguistics,2019:7-14. [45] STICKLAND A C,MURRAY I. BERT and PALs:projected attention layers for efficient adaptation in multi-task learning[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR. org,2019:5986-5995. [46] LIU X,HE P,CHEN W,et al. Multi-task deep neural networks for natural language understanding[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2019:4487-4496. [47] SUN Y,WANG S,LI Y,et al. ERNIE 2.0:a continual pretraining framework for language understanding[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA:AAAI Press,2020:8968-8975. [48] HOULSBY N,GIURGIU A,JASTRZEBSKI S,et al. Parameterefficient transfer learning for NLP[C]//Proceedings of the 36th International Conference on Machine Learning. New York:JMLR.org,2019:2790-2799. [49] PHANG J,FÉVRY T,BOWMAN S R. Sentence encoders on STILTs:supplementary training on intermediate labeled-data tasks[EB/OL].[2019-12-13]. https://arxiv.org/pdf/1811.01088.pdf. [50] REBUFFI S A, VEDALDI A, BILEN H, et al. Efficient parametrization of multi-domain deep neural networks[C]//Proceedings of the2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:8119-8127. [51] YANG J,ZHAO H. Deepening hidden representations from pretrained language models[EB/OL].[2020-06-03]. https://arxiv.org/pdf/1911.01940.pdf. [52] GURURANGAN S,MARASOVIĆ A,SWAYAMDIPTA S,et al. Don't stop pretraining:adapt language models to domains and tasks[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2020:8342-8360. [53] SUN Y,WANG S,LI Y,et al. ERNIE:enhanced representation through knowledge integration[EB/OL].[2020-04-24]. https://arxiv.org/pdf/1904.09223.pdf. [54] HE H,NING Q,ROTH D. QuASE:question-answer driven sentence encoding[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics,2020:8743-8758. [55] GOODFELLOW I J,SHLENS J,SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL].[2019-12-06]. https://arxiv.org/pdf/1412.6572.pdf. [56] JIN D,JIN Z,ZHOU J T,et al. Is BERT really robust? A strong baseline for natural language attack on text classification and entailment[C]//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto,CA:AAAI Press,2020:8018-8025. [57] BENDER E M, KOLLER A. Climbing towards NLU:on meaning, form, and understanding in the age of data[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2020:5185-5198. [58] JIAO X,YIN Y,SHANG L,et al. TinyBERT:distilling BERT for natural language understanding[M]//COHN T,HE Y,LIU Y. Findings of the Association for Computational Linguistics:EMNLP 2020. Stroudsburg, PA:Association for Computational Linguistics,2020:4163-4174. [59] YU F,TANG J,YIN W,et al. ERNIE-ViL:knowledge enhanced vision-language representations through scene graph[EB/OL].[2020-09-02]. https://arxiv.org/pdf/2006.16934.pdf. [60] MIYATO T,DAI A M,GOODFELLOW I. Adversarial training methods for semi-supervised text classification[EB/OL].[2020-07-02]. https://arxiv.org/pdf/1605.07725.pdf. [61] MADRY A,MAKELOV A,SCHMIDT L,et al. Towards deep learning models resistant to adversarial attacks[EB/OL].[2019-12-13]. https://arxiv.org/pdf/1706.06083.pdf. [62] ZHU C,CHENG Y,GAN Z,et al. FreeLB:enhanced adversarial training for language understanding[EB/OL].[2020-07-08]. https://arxiv.org/pdf/1909.11764.pdf. [63] RAJPURKAR P,JIA R,LIANG P. Know what you don't know:unanswerable questions for SQuAD[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics 2018:784-789. [64] WANG A, PRUKSACHATKUN Y, NANGIA N, et al. SuperGLUE:a stickier benchmark for general-purpose language understanding systems[C/OL]//Proceedings of the 33rd Conference on Neural Information Processing Systems.[2020-06-02]. https://papers.nips.cc/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf. [65] RIBEIRO M T,WU T,GUESTRIN C,et al. Beyond accuracy:behavioral testing of NLP models with checklist[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2020:4902-4912. |
[1] | 陈成瑞, 孙宁, 何世彪, 廖勇. 面向C-V2X通信的基于深度学习的联合信道估计与均衡算法[J]. 计算机应用, 2021, 41(9): 2687-2693. |
[2] | 宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735. |
[3] | 李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509. |
[4] | 张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614. |
[5] | 赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503. |
[6] | 徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725. |
[7] | 牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779. |
[8] | 王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747. |
[9] | 郑志强, 胡鑫, 翁智, 王雨禾, 程曦. 基于改进DenseNet的牛眼图像特征提取方法[J]. 计算机应用, 2021, 41(9): 2780-2784. |
[10] | 谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495. |
[11] | 代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551. |
[12] | 刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522. |
[13] | 刘子辰, 李小娟, 韦伟. 基于循环神经网络的专利价格自动评估[J]. 计算机应用, 2021, 41(9): 2532-2538. |
[14] | 丁尹, 桑楠, 李晓瑜, 吴飞舟. 基于循环神经网络的电信行业容量数据预测方法[J]. 计算机应用, 2021, 41(8): 2373-2378. |
[15] | 秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||