Few-shot text classification method based on prompt learning

doi:10.11772/j.issn.1001-9081.2022081295

Abstract

Abstract:

Text classification tasks usually rely on sufficient labeled data. Concerning the over-fitting problem of classification models on samples with small size in low resource scenarios， a few-shot text classification method based on prompt learning called BERT-P-Tuning was proposed. Firstly， the pre-trained model BERT （Bidirectional Encoder Representations from Transformers） was used to learn the optimal prompt template from labeled samples. Then， the prompt template and vacancy were filled in each sample， and the text classification task was transformed into the cloze test task. Finally， the final labels were obtained by predicting the word with the highest probability of the vacant positions and combining the mapping relationship between it and labels. Experimental results on the short text classification tasks of public dataset FewCLUE show that the proposed method have significantly improved the evaluation indicators compared to the BERT fine-tuning based method. In specific， the proposed method has the accuracy and F1 score increased by 25.2 and 26.7 percentage points respectively on the binary classification task， and the proposed method has the accuracy and F1 score increased by 6.6 and 8.0 percentage points respectively on the multi-class classification task. Compared with the PET （Pattern Exploiting Training） method of constructing templates manually， the proposed method has the accuracy increased by 2.9 and 2.8 percentage points respectively on two tasks， and the F1 score increased by 4.4 and 4.2 percentage points respectively on two tasks. The above verifies the effectiveness of applying pre-trained model on few-shot tasks.

Key words: few-shot learning, text classification, pre-trained model, prompt learning, adaptive template

摘要：

文本分类任务通常依赖足量的标注数据，针对低资源场景下的分类模型在小样本上的过拟合问题，提出一种基于提示学习的小样本文本分类方法BERT-P-Tuning。首先，利用预训练模型BERT（Bidirectional Encoder Representations from Transformers）在标注样本上学习到最优的提示模板；然后，在每条样本中补充提示模板和空缺，将文本分类任务转化为完形填空任务；最后，通过预测空缺位置概率最高的词并结合它与标签之间的映射关系得到最终的标签。在公开数据集FewCLUE上的短文本分类任务上进行实验，实验结果表明，所提方法相较于基于BERT微调的方法在评价指标上有显著提高。所提方法在二分类任务上的准确率与F1值分别提升了25.2和26.7个百分点，在多分类任务上的准确率与F1值分别提升了6.6和8.0个百分点。相较于手动构建模板的PET（Pattern Exploiting Training）方法，所提方法在两个任务上的准确率分别提升了2.9和2.8个百分点，F1值分别提升了4.4和4.2个百分点，验证了预训练模型应用在小样本任务的有效性。

关键词: 小样本学习, 文本分类, 预训练模型, 提示学习, 自适应模板

CLC Number:

TP391.1

Bihui YU, Xingye CAI, Jingxuan WEI. Few-shot text classification method based on prompt learning[J]. Journal of Computer Applications, 2023, 43(9): 2735-2740.

于碧辉, 蔡兴业, 魏靖烜. 基于提示学习的小样本文本分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2735-2740.

Figures/Tables 8

References 27

1	CAI J J， LI J P， LI W， et al. Deeplearning model used in text classification［C］// Proceedings of the 15th International Computer Conference on Wavelet Active Media Technology and Information Processing. Piscataway： IEEE， 2018： 123-126. 10.1109/iccwamtip.2018.8632592
2	赵凯琳，靳小龙，王元卓. 小样本学习研究综述［J］. 软件学报， 2021， 32（2）：349-369. 10.13328/j.cnki.jos.006138
	ZHAO K L， JIN X L， WANG Y Z. Survey on few-shot learning［J］. Journal of Software， 2021， 32（2）：349-369. 10.13328/j.cnki.jos.006138
3	李凡长，刘洋，吴鹏翔，等. 元学习研究综述［J］. 计算机学报， 2021， 44（2）：422-446. 10.11897/SP.J.1016.2021.00422
	LI F Z， LIU Y， WU P X， et al. A survey on recent advances in meta-learning［J］. Chinese Journal of Computers， 2021， 44（2）：422-446. 10.11897/SP.J.1016.2021.00422
4	ANDRYCHOWICZ M， DENIL M， COLMENAREJO S G， et al. Learning to learn by gradient descent by gradient descent［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2016： 3988-3996.
5	RAVI S， LAROCHELLE H. Optimization as a model for few-shot learning［EB/OL］. （2022-07-22）［2022-08-12］.. 10.1007/978-3-030-63416-2_861
6	FINN C， ABBEEL P， LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1126-1135. 10.1109/icra.2016.7487173
7	VINYALS O， BLUNDELL C， LILLICRAP T， et al. Matching networks for one shot learning［C］// Proceedings of the 30th International Conference on Neural Information Processing System. Red Hook， NY： Curran Associates Inc.， 2016：3637-3645.
8	CAI Q， PAN Y W， YAO T， et al. Memory matching networks for one-shot image recognition［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4080-4088. 10.1109/cvpr.2018.00429
9	SNELL J， SWERSKY K， ZEMEL R. Prototypical networks for few-shot learning［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 4080-4090.
10	MEHROTRA A， DUKKIPATI A. Generative adversarial residual pairwise networks for one shot learning［EB/OL］. （2017-03-23）［2022-08-31］.. 10.1109/wacv.2019.00099
11	HOU R B， CHANG H， MA B P， et al. Cross attention network for few-shot classification［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2019： 4003-4014.
12	GENG R Y， LI B H， LI Y B， et al. Induction networks for few-shot text classification［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2019： 3904-3913. 10.18653/v1/d19-1403
13	YU M， GUO X X， YI J F， et al. Diverse few-shot text classification with multiple metrics［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： ACL， 2018： 1206-1215. 10.18653/v1/n18-1109
14	BAILEY K， CHOPRA S. Few-shot text classification with pre-trained word embeddings and a human in the loop［EB/OL］. （2018-04-05）［2022-08-12］.. 10.48550/arXiv.1804.02063
15	RADFORD A， NARASIMHAN K， SALIMANS T， et al. Improving language understanding by generative pre-training［EB/OL］. ［2022-08-12］..
16	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019： 4171-4186. 10.18653/v1/n18-2
17	YANG Z L， DAI Z H， YANG Y M， et al. XLNet： generalized autoregressive pretraining for language understanding［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2019： 5754-5764.
18	SONG K T， TAN X， QIN T， et al. MPNet： masked and permuted pre-training for language understanding［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020： 16857-16867. 10.48550/arXiv.2004.09297
19	李舟军，范宇，吴贤杰. 面向自然语言处理的预训练技术研究综述［J］. 计算机科学， 2020， 47（3）：162-173. 10.11896/jsjkx.191000167
	LI Z J， FAN Y， WU X J. Survey of natural language processing pre-training techniques［J］. Computer Science， 2020， 47（3）：162-173. 10.11896/jsjkx.191000167
20	LIU P F， YUAN W Z， FU J L， et al. Pre-train， prompt， and predict： a systematic survey of prompting methods in natural language processing［EB/OL］. （2021-07-28）［2022-08-21］.. 10.1145/3560815
21	SCHICK T， SCHÜTZE H. Exploiting cloze questions for few shot text classification and natural language inference［C］// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics： Main Volume. Stroudsburg， PA： ACL， 2021： 255-269. 10.18653/v1/2021.eacl-main.20
22	LI X L， LIANG P. Prefix-tuning： optimizing continuous prompts for generation［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg， PA： ACL， 2021： 4582-4597. 10.18653/v1/2021.acl-long.353
23	LIU X， ZHENG Y N， DU Z X， et al. GPT understands， too［EB/OL］. （2021-03-18）［2022-08-29］.. 10.1016/j.aiopen.2023.08.012
24	HOULSBY N， GIURGIU A， JASTRZĘBSKI S， et al. Parameter-efficient transfer learning for NLP［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 2790-2799.
25	XU L， LU X J， YUAN C Y， et al. FewCLUE： a Chinese few-shot learning evaluation benchmark［EB/OL］. （2021-09-29）［2022-08-29］..
26	CUI Y M， CHE W X， LIU T， et al. Pre-training with whole word masking for Chinese BERT［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2021， 29： 3504-3514. 10.1109/taslp.2021.3124365
27	SUN Y， ZHENG Y， HAO C， et al. NSP-BERT： a prompt-based zero-shot learner through an original pre-training task - next sentence prediction［C］// Proceedings of the 29th International Conference on Computational Linguistics. Stroudsburg， PA： ACL， 2022： 3233-3250.

环境	配置	环境	配置
操作系统	Linux-ubuntu 20.04	GPU	Tesla P40
PyTorch	1.11.0+CUDA 11.3	内存	32 GB

环境	配置	环境	配置
操作系统	Linux-ubuntu 20.04	GPU	Tesla P40
PyTorch	1.11.0+CUDA 11.3	内存	32 GB

参数	值
Transformer层数	12
优化器	Adam
学习率	10^-5
Batch_size	8
Max_length	模板长度+最长样本长度

参数	值
Transformer层数	12
优化器	Adam
学习率	10^-5
Batch_size	8
Max_length	模板长度+最长样本长度

模板	候选词	准确率/%
这是一条［MASK］的评价。	正面/负面	56.5
我很［MASK］它。	喜欢/讨厌	73.9
［MASK］满意	很/不	69.6
买完回来用了几天，个人认为还不错，［MASK］推荐大家购买。	很/不	74.4
但是性价比较［MASK］。	高/低	64.7