基于提示学习的小样本文本分类方法

doi:10.11772/j.issn.1001-9081.2022081295

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (9): 2735-2740.DOI: 10.11772/j.issn.1001-9081.2022081295

基于提示学习的小样本文本分类方法

于碧辉¹^,², 蔡兴业¹^,²(), 魏靖烜¹^,²

^1.中国科学院大学，北京 100049
^2.中国科学院沈阳计算技术研究所，沈阳 110168

收稿日期:2022-09-05 修回日期:2022-12-09 接受日期:2023-01-03 发布日期:2023-02-28 出版日期:2023-09-10
通讯作者: 蔡兴业
作者简介:于碧辉（1982—），男，辽宁沈阳人，研究员，博士，主要研究方向：知识工程、大数据、语义网
魏靖烜（1998—），男，山东泰安人，硕士研究生，主要研究方向：多模态分类与生成。
基金资助:
国家重点研发计划项目(2019YFB1405803)

Few-shot text classification method based on prompt learning

Bihui YU¹^,², Xingye CAI¹^,²(), Jingxuan WEI¹^,²

^1.University of Chinese Academy of Sciences，Beijing 100049，China
^2.Shenyang Institute of Computing Technology，Chinese Academy of Sciences，Shenyang Liaoning 110168，China

Received:2022-09-05 Revised:2022-12-09 Accepted:2023-01-03 Online:2023-02-28 Published:2023-09-10
Contact: Xingye CAI
About author:YU Bihui， born in 1982， Ph. D.， research fellow. His research interests include knowledge engineering， big data， semantic Web.
WEI Jingxuan， born in 1998， M. S. candidate. His research interests include multimodal classification and generation.
Supported by:
the National Key Research and Development Program of China(2019YFB1405803)

摘要/Abstract

摘要：

文本分类任务通常依赖足量的标注数据，针对低资源场景下的分类模型在小样本上的过拟合问题，提出一种基于提示学习的小样本文本分类方法BERT-P-Tuning。首先，利用预训练模型BERT（Bidirectional Encoder Representations from Transformers）在标注样本上学习到最优的提示模板；然后，在每条样本中补充提示模板和空缺，将文本分类任务转化为完形填空任务；最后，通过预测空缺位置概率最高的词并结合它与标签之间的映射关系得到最终的标签。在公开数据集FewCLUE上的短文本分类任务上进行实验，实验结果表明，所提方法相较于基于BERT微调的方法在评价指标上有显著提高。所提方法在二分类任务上的准确率与F1值分别提升了25.2和26.7个百分点，在多分类任务上的准确率与F1值分别提升了6.6和8.0个百分点。相较于手动构建模板的PET（Pattern Exploiting Training）方法，所提方法在两个任务上的准确率分别提升了2.9和2.8个百分点，F1值分别提升了4.4和4.2个百分点，验证了预训练模型应用在小样本任务的有效性。

关键词: 小样本学习, 文本分类, 预训练模型, 提示学习, 自适应模板

Abstract:

Text classification tasks usually rely on sufficient labeled data. Concerning the over-fitting problem of classification models on samples with small size in low resource scenarios， a few-shot text classification method based on prompt learning called BERT-P-Tuning was proposed. Firstly， the pre-trained model BERT （Bidirectional Encoder Representations from Transformers） was used to learn the optimal prompt template from labeled samples. Then， the prompt template and vacancy were filled in each sample， and the text classification task was transformed into the cloze test task. Finally， the final labels were obtained by predicting the word with the highest probability of the vacant positions and combining the mapping relationship between it and labels. Experimental results on the short text classification tasks of public dataset FewCLUE show that the proposed method have significantly improved the evaluation indicators compared to the BERT fine-tuning based method. In specific， the proposed method has the accuracy and F1 score increased by 25.2 and 26.7 percentage points respectively on the binary classification task， and the proposed method has the accuracy and F1 score increased by 6.6 and 8.0 percentage points respectively on the multi-class classification task. Compared with the PET （Pattern Exploiting Training） method of constructing templates manually， the proposed method has the accuracy increased by 2.9 and 2.8 percentage points respectively on two tasks， and the F1 score increased by 4.4 and 4.2 percentage points respectively on two tasks. The above verifies the effectiveness of applying pre-trained model on few-shot tasks.

Key words: few-shot learning, text classification, pre-trained model, prompt learning, adaptive template

中图分类号:

TP391.1

于碧辉, 蔡兴业, 魏靖烜. 基于提示学习的小样本文本分类方法[J]. 计算机应用, 2023, 43(9): 2735-2740.

Bihui YU, Xingye CAI, Jingxuan WEI. Few-shot text classification method based on prompt learning[J]. Journal of Computer Applications, 2023, 43(9): 2735-2740.

图/表 8

参考文献 27

1	CAI J J， LI J P， LI W， et al. Deeplearning model used in text classification［C］// Proceedings of the 15th International Computer Conference on Wavelet Active Media Technology and Information Processing. Piscataway： IEEE， 2018： 123-126. 10.1109/iccwamtip.2018.8632592
2	赵凯琳，靳小龙，王元卓. 小样本学习研究综述［J］. 软件学报， 2021， 32（2）：349-369. 10.13328/j.cnki.jos.006138
	ZHAO K L， JIN X L， WANG Y Z. Survey on few-shot learning［J］. Journal of Software， 2021， 32（2）：349-369. 10.13328/j.cnki.jos.006138
3	李凡长，刘洋，吴鹏翔，等. 元学习研究综述［J］. 计算机学报， 2021， 44（2）：422-446. 10.11897/SP.J.1016.2021.00422
	LI F Z， LIU Y， WU P X， et al. A survey on recent advances in meta-learning［J］. Chinese Journal of Computers， 2021， 44（2）：422-446. 10.11897/SP.J.1016.2021.00422
4	ANDRYCHOWICZ M， DENIL M， COLMENAREJO S G， et al. Learning to learn by gradient descent by gradient descent［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2016： 3988-3996.
5	RAVI S， LAROCHELLE H. Optimization as a model for few-shot learning［EB/OL］. （2022-07-22）［2022-08-12］.. 10.1007/978-3-030-63416-2_861
6	FINN C， ABBEEL P， LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1126-1135. 10.1109/icra.2016.7487173
7	VINYALS O， BLUNDELL C， LILLICRAP T， et al. Matching networks for one shot learning［C］// Proceedings of the 30th International Conference on Neural Information Processing System. Red Hook， NY： Curran Associates Inc.， 2016：3637-3645.
8	CAI Q， PAN Y W， YAO T， et al. Memory matching networks for one-shot image recognition［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 4080-4088. 10.1109/cvpr.2018.00429
9	SNELL J， SWERSKY K， ZEMEL R. Prototypical networks for few-shot learning［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 4080-4090.
10	MEHROTRA A， DUKKIPATI A. Generative adversarial residual pairwise networks for one shot learning［EB/OL］. （2017-03-23）［2022-08-31］.. 10.1109/wacv.2019.00099
11	HOU R B， CHANG H， MA B P， et al. Cross attention network for few-shot classification［C］// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2019： 4003-4014.
12	GENG R Y， LI B H， LI Y B， et al. Induction networks for few-shot text classification［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Stroudsburg， PA： ACL， 2019： 3904-3913. 10.18653/v1/d19-1403
13	YU M， GUO X X， YI J F， et al. Diverse few-shot text classification with multiple metrics［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： ACL， 2018： 1206-1215. 10.18653/v1/n18-1109
14	BAILEY K， CHOPRA S. Few-shot text classification with pre-trained word embeddings and a human in the loop［EB/OL］. （2018-04-05）［2022-08-12］.. 10.48550/arXiv.1804.02063
15	RADFORD A， NARASIMHAN K， SALIMANS T， et al. Improving language understanding by generative pre-training［EB/OL］. ［2022-08-12］..
16	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019： 4171-4186. 10.18653/v1/n18-2
17	YANG Z L， DAI Z H， YANG Y M， et al. XLNet： generalized autoregressive pretraining for language understanding［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2019： 5754-5764.
18	SONG K T， TAN X， QIN T， et al. MPNet： masked and permuted pre-training for language understanding［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2020： 16857-16867. 10.48550/arXiv.2004.09297
19	李舟军，范宇，吴贤杰. 面向自然语言处理的预训练技术研究综述［J］. 计算机科学， 2020， 47（3）：162-173. 10.11896/jsjkx.191000167
	LI Z J， FAN Y， WU X J. Survey of natural language processing pre-training techniques［J］. Computer Science， 2020， 47（3）：162-173. 10.11896/jsjkx.191000167
20	LIU P F， YUAN W Z， FU J L， et al. Pre-train， prompt， and predict： a systematic survey of prompting methods in natural language processing［EB/OL］. （2021-07-28）［2022-08-21］.. 10.1145/3560815
21	SCHICK T， SCHÜTZE H. Exploiting cloze questions for few shot text classification and natural language inference［C］// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics： Main Volume. Stroudsburg， PA： ACL， 2021： 255-269. 10.18653/v1/2021.eacl-main.20
22	LI X L， LIANG P. Prefix-tuning： optimizing continuous prompts for generation［C］// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing （Volume 1： Long Papers）. Stroudsburg， PA： ACL， 2021： 4582-4597. 10.18653/v1/2021.acl-long.353
23	LIU X， ZHENG Y N， DU Z X， et al. GPT understands， too［EB/OL］. （2021-03-18）［2022-08-29］.. 10.1016/j.aiopen.2023.08.012
24	HOULSBY N， GIURGIU A， JASTRZĘBSKI S， et al. Parameter-efficient transfer learning for NLP［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR.org， 2019： 2790-2799.
25	XU L， LU X J， YUAN C Y， et al. FewCLUE： a Chinese few-shot learning evaluation benchmark［EB/OL］. （2021-09-29）［2022-08-29］..
26	CUI Y M， CHE W X， LIU T， et al. Pre-training with whole word masking for Chinese BERT［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2021， 29： 3504-3514. 10.1109/taslp.2021.3124365
27	SUN Y， ZHENG Y， HAO C， et al. NSP-BERT： a prompt-based zero-shot learner through an original pre-training task - next sentence prediction［C］// Proceedings of the 29th International Conference on Computational Linguistics. Stroudsburg， PA： ACL， 2022： 3233-3250.

环境	配置	环境	配置
操作系统	Linux-ubuntu 20.04	GPU	Tesla P40
PyTorch	1.11.0+CUDA 11.3	内存	32 GB

环境	配置	环境	配置
操作系统	Linux-ubuntu 20.04	GPU	Tesla P40
PyTorch	1.11.0+CUDA 11.3	内存	32 GB

参数	值
Transformer层数	12
优化器	Adam
学习率	10^-5
Batch_size	8
Max_length	模板长度+最长样本长度

参数	值
Transformer层数	12
优化器	Adam
学习率	10^-5
Batch_size	8
Max_length	模板长度+最长样本长度

模板	候选词	准确率/%
这是一条［MASK］的评价。	正面/负面	56.5
我很［MASK］它。	喜欢/讨厌	73.9
［MASK］满意	很/不	69.6
买完回来用了几天，个人认为还不错，［MASK］推荐大家购买。	很/不	74.4
但是性价比较［MASK］。	高/低	64.7

基于提示学习的小样本文本分类方法

Few-shot text classification method based on prompt learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 27

相关文章 15

编辑推荐

Metrics

模型	EPRTSMT		Tnews
模型	Acc	F1	Acc	F1
Metric BERT	44.5	43.4	31.1	31.5
LSTM	52.8	63.4	—	—
BERT	61.7	60.6	48.0	47.1
PET	84.0	82.9	51.8	50.9
BERT-P-Tuning	86.9	87.3	54.6	55.1

模型	EPRTSMT		Tnews
模型	Acc	F1	Acc	F1
LSTM-P-Tuning^［23］	85.2	85.6	52.3	52.7
BERT-P-Tuning-1	79.4	82.1	48.8	47.3
BERT-P-Tuning-2	82.8	85.7	52.1	53.5
BERT-P-Tuning-Final	86.9	87.3	54.6	55.1

[1]	张心月, 刘蓉, 魏驰宇, 方可. 融合提示知识的方面级情感分析方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2753-2759.
[2]	田悦霖, 黄瑞章, 任丽娜. 融合局部语义特征的学者细粒度信息提取方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2707-2714.
[3]	崔雨萌, 王靖亚, 刘晓文, 闫尚义, 陶知众. 融合注意力和裁剪机制的通用文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2396-2405.
[4]	张小艳, 段正宇. 基于句级别GAN的跨语言零资源命名实体识别模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2406-2411.
[5]	姜钧舰, 刘达维, 刘逸凡, 任酉贵, 赵志滨. 基于孪生网络的小样本目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2325-2329.
[6]	石利锋, 倪郑威. 基于槽位相关信息提取的对话状态追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1430-1437.
[7]	杨森淇, 段旭良, 肖展, 郎松松, 李志勇. 基于ERNIE+DPCNN+BiGRU的农业新闻文本分类[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1461-1466.
[8]	张旭, 生龙, 张海芳, 田丰, 王巍. 基于标签混淆的院前急救文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1050-1055.
[9]	王惠茹, 李秀红, 李哲, 马春明, 任泽裕, 杨丹. 多模态预训练模型综述[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 991-1004.
[10]	林呈宇, 王雷, 薛聪. 标签语义增强的弱监督文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 335-342.
[11]	徐铭, 李林昊, 齐巧玲, 王利琴. 基于注意力平衡列表的溯因推理模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 349-355.
[12]	胡婕, 陈晓茜, 张龑. 基于池化和特征组合增强BERT的答案选择模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 365-373.
[13]	胡婕, 胡燕, 刘梦赤, 张龑. 基于知识库实体增强BERT模型的中文命名实体识别[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2680-2685.
[14]	江静, 陈渝, 孙界平, 琚生根. 融合后验概率校准训练的文本分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1789-1795.
[15]	杨先凤, 赵家和, 李自强. 融合字注释的文本分类模型[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1317-1323.