《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (6): 1767-1774.DOI: 10.11772/j.issn.1001-9081.2023050709
• 人工智能 • 上一篇
余新言1, 曾诚1,2,3(), 王乾1, 何鹏2,3,4, 丁晓玉1
收稿日期:
2023-06-06
修回日期:
2023-11-15
接受日期:
2023-12-05
发布日期:
2024-01-04
出版日期:
2024-06-10
通讯作者:
曾诚
作者简介:
余新言(1995—),女,湖北荆州人,硕士研究生,主要研究方向:自然语言处理、小样本学习基金资助:
Xinyan YU1, Cheng ZENG1,2,3(), Qian WANG1, Peng HE2,3,4, Xiaoyu DING1
Received:
2023-06-06
Revised:
2023-11-15
Accepted:
2023-12-05
Online:
2024-01-04
Published:
2024-06-10
Contact:
Cheng ZENG
About author:
YU Xinyan, born in 1995, M. S. candidate. Her research interests include natural language processing, few-shot learning.Supported by:
摘要:
基于预训练微调的分类方法通常需要大量带标注的数据,导致无法应用于小样本分类任务。因此,针对中文小样本新闻主题分类任务,提出一种基于知识增强和提示学习的分类方法KPL(Knowledge enhancement and Prompt Learning)。首先,利用预训练模型在训练集上学习最优的提示模板;其次,将提示模板与输入文本结合,使分类任务转化为完形填空任务;同时利用外部知识扩充标签词空间,丰富标签词的语义信息;最后,对预测的标签词与原始的标签进行映射。通过在THUCNews、SHNews和Toutiao这3个新闻数据集上进行随机采样,形成小样本训练集和验证集进行实验。实验结果表明,所提方法在上述数据集上的1-shot、5-shot、10-shot和20-shot任务上整体表现有所提升,尤其在1-shot任务上提升效果突出,与基线小样本分类方法相比,准确率分别提高了7.59、2.11和3.10个百分点以上,验证了KPL在小样本新闻主题分类任务上的有效性。
中图分类号:
余新言, 曾诚, 王乾, 何鹏, 丁晓玉. 基于知识增强和提示学习的小样本新闻主题分类方法[J]. 计算机应用, 2024, 44(6): 1767-1774.
Xinyan YU, Cheng ZENG, Qian WANG, Peng HE, Xiaoyu DING. Few-shot news topic classification method based on knowledge enhancement and prompt learning[J]. Journal of Computer Applications, 2024, 44(6): 1767-1774.
数据集 | 标签 | 标签词集 |
---|---|---|
THUCNews | 房地产 | 房地产,房产,房地产业 |
金融 | 金银,金融业,金融市场 | |
教育 | 高考,考生,高中,文综 | |
Toutiao | 电竞 | 网络游戏,竞技,玩家, |
农业 | 第一产业,农林牧副渔,农林 | |
证券 | 出游,旅行,出行 | |
SHNews | 科技 | 高科技,高新技术,技术 |
文化 | 文明,人文,文风 | |
旅游 | 出游,旅行,出行 |
表1 各数据集标签词集示例
Tab.1 Examples of label word sets for different datasets
数据集 | 标签 | 标签词集 |
---|---|---|
THUCNews | 房地产 | 房地产,房产,房地产业 |
金融 | 金银,金融业,金融市场 | |
教育 | 高考,考生,高中,文综 | |
Toutiao | 电竞 | 网络游戏,竞技,玩家, |
农业 | 第一产业,农林牧副渔,农林 | |
证券 | 出游,旅行,出行 | |
SHNews | 科技 | 高科技,高新技术,技术 |
文化 | 文明,人文,文风 | |
旅游 | 出游,旅行,出行 |
数据集 | 样本数 | 标签 类别数 | ||
---|---|---|---|---|
训练集 | 验证集 | 测试集 | ||
THUCNews | 180 000 | 10 000 | 10 000 | 10 |
Toutiao | 267 877 | 57 401 | 57 401 | 15 |
SHNews | 22 699 | 5 764 | 5 755 | 12 |
表2 实验中使用的数据集统计信息
Tab.2 Statistics of experiment datasets
数据集 | 样本数 | 标签 类别数 | ||
---|---|---|---|---|
训练集 | 验证集 | 测试集 | ||
THUCNews | 180 000 | 10 000 | 10 000 | 10 |
Toutiao | 267 877 | 57 401 | 57 401 | 15 |
SHNews | 22 699 | 5 764 | 5 755 | 12 |
数据集 | 模板 |
---|---|
THUCNews | 这是一条[MASK]新闻:x |
[MASK]新闻:x | |
x是[MASK]新闻 | |
Toutiao | 这是一条[MASK]新闻:x |
[MASK]新闻:x | |
分类:[MASK]x | |
SHNews | 这是一条[MASK]新闻:x |
[MASK]新闻:x | |
主题:[MASK]x |
表3 实验数据集的中文模板
Tab.3 Chinese templates for experiment datasets
数据集 | 模板 |
---|---|
THUCNews | 这是一条[MASK]新闻:x |
[MASK]新闻:x | |
x是[MASK]新闻 | |
Toutiao | 这是一条[MASK]新闻:x |
[MASK]新闻:x | |
分类:[MASK]x | |
SHNews | 这是一条[MASK]新闻:x |
[MASK]新闻:x | |
主题:[MASK]x |
k-shot | 模型 | THUCNews | Toutiao | SHNews | |||
---|---|---|---|---|---|---|---|
Acc | Macro_F1 | Acc | Macro_F1 | Acc | Macro_F1 | ||
1 | FT | 48.27 | 45.90 | 28.86 | 28.71 | 28.78 | 26.79 |
Soft-verb | 67.29 | 66.62 | 63.91 | 58.44 | 55.98 | 54.31 | |
Auto-verb | 34.38 | 31.98 | 37.11 | 31.27 | 30.36 | 27.50 | |
PET | 69.53 | 68.92 | 59.84 | 55.07 | 56.28 | 54.48 | |
Soft-prompt | 65.00 | 64.22 | 61.00 | 55.66 | 47.69 | 46.41 | |
KPL | 77.12 | 76.94 | 67.01 | 61.68 | 58.39 | 56.46 | |
5 | FT | 78.72 | 78.67 | 67.94 | 68.08 | 57.72 | 58.28 |
Soft-verb | 81.97 | 81.77 | 73.04 | 67.03 | 65.67 | 65.34 | |
Auto-verb | 76.47 | 75.52 | 68.58 | 62.84 | 58.78 | 58.45 | |
PET | 82.23 | 82.06 | 72.58 | 66.87 | 65.48 | 65.27 | |
Soft-prompt | 80.71 | 80.26 | 73.06 | 67.20 | 65.13 | 64.78 | |
KPL | 82.95 | 82.78 | 74.02 | 68.04 | 66.08 | 65.80 | |
10 | FT | 81.53 | 81.43 | 72.16 | 72.74 | 64.29 | 64.44 |
Soft-verb | 84.35 | 84.30 | 74.97 | 69.09 | 67.51 | 67.37 | |
Auto-verb | 80.72 | 79.72 | 73.84 | 67.68 | 64.98 | 64.46 | |
PET | 84.16 | 84.07 | 74.58 | 68.95 | 67.85 | 67.66 | |
Soft-prompt | 84.95 | 84.92 | 75.00 | 68.87 | 66.36 | 65.88 | |
KPL | 85.50 | 85.38 | 75.21 | 69.38 | 68.60 | 68.54 | |
20 | FT | 83.88 | 83.89 | 74.88 | 75.15 | 66.36 | 66.53 |
Soft-verb | 86.19 | 86.12 | 76.55 | 70.57 | 69.30 | 69.26 | |
Auto-verb | 81.78 | 79.95 | 76.50 | 70.19 | 68.15 | 67.81 | |
PET | 86.67 | 86.60 | 75.70 | 69.72 | 68.82 | 68.91 | |
Soft-prompt | 86.48 | 86.45 | 76.73 | 71.09 | 69.59 | 69.47 | |
KPL | 87.04 | 86.96 | 77.21 | 71.24 | 70.31 | 70.16 |
表4 不同数据集在1/5/10/20-shot文本分类实验结果 (%)
Tab.4 Experiment results of 1/5/10/20-shot text classification on different datasets
k-shot | 模型 | THUCNews | Toutiao | SHNews | |||
---|---|---|---|---|---|---|---|
Acc | Macro_F1 | Acc | Macro_F1 | Acc | Macro_F1 | ||
1 | FT | 48.27 | 45.90 | 28.86 | 28.71 | 28.78 | 26.79 |
Soft-verb | 67.29 | 66.62 | 63.91 | 58.44 | 55.98 | 54.31 | |
Auto-verb | 34.38 | 31.98 | 37.11 | 31.27 | 30.36 | 27.50 | |
PET | 69.53 | 68.92 | 59.84 | 55.07 | 56.28 | 54.48 | |
Soft-prompt | 65.00 | 64.22 | 61.00 | 55.66 | 47.69 | 46.41 | |
KPL | 77.12 | 76.94 | 67.01 | 61.68 | 58.39 | 56.46 | |
5 | FT | 78.72 | 78.67 | 67.94 | 68.08 | 57.72 | 58.28 |
Soft-verb | 81.97 | 81.77 | 73.04 | 67.03 | 65.67 | 65.34 | |
Auto-verb | 76.47 | 75.52 | 68.58 | 62.84 | 58.78 | 58.45 | |
PET | 82.23 | 82.06 | 72.58 | 66.87 | 65.48 | 65.27 | |
Soft-prompt | 80.71 | 80.26 | 73.06 | 67.20 | 65.13 | 64.78 | |
KPL | 82.95 | 82.78 | 74.02 | 68.04 | 66.08 | 65.80 | |
10 | FT | 81.53 | 81.43 | 72.16 | 72.74 | 64.29 | 64.44 |
Soft-verb | 84.35 | 84.30 | 74.97 | 69.09 | 67.51 | 67.37 | |
Auto-verb | 80.72 | 79.72 | 73.84 | 67.68 | 64.98 | 64.46 | |
PET | 84.16 | 84.07 | 74.58 | 68.95 | 67.85 | 67.66 | |
Soft-prompt | 84.95 | 84.92 | 75.00 | 68.87 | 66.36 | 65.88 | |
KPL | 85.50 | 85.38 | 75.21 | 69.38 | 68.60 | 68.54 | |
20 | FT | 83.88 | 83.89 | 74.88 | 75.15 | 66.36 | 66.53 |
Soft-verb | 86.19 | 86.12 | 76.55 | 70.57 | 69.30 | 69.26 | |
Auto-verb | 81.78 | 79.95 | 76.50 | 70.19 | 68.15 | 67.81 | |
PET | 86.67 | 86.60 | 75.70 | 69.72 | 68.82 | 68.91 | |
Soft-prompt | 86.48 | 86.45 | 76.73 | 71.09 | 69.59 | 69.47 | |
KPL | 87.04 | 86.96 | 77.21 | 71.24 | 70.31 | 70.16 |
k-shot | P-tuning | Know | Acc | Macro_F1 |
---|---|---|---|---|
1 | × | × | 72.19 | 71.84 |
√ | × | 73.56 | 73.07 | |
× | √ | 76.91 | 76.75 | |
√ | √ | 77.12 | 76.94 | |
5 | × | × | 81.29 | 81.01 |
√ | × | 82.36 | 82.33 | |
× | √ | 82.33 | 82.14 | |
√ | √ | 82.95 | 82.78 | |
10 | × | × | 84.58 | 84.56 |
√ | × | 85.61 | 85.53 | |
× | √ | 85.56 | 85.50 | |
√ | √ | 85.50 | 85.38 | |
20 | × | × | 86.63 | 86.61 |
√ | × | 86.70 | 86.65 | |
× | √ | 86.99 | 86.94 | |
√ | √ | 87.04 | 86.96 |
表5 在THUCNews上的消融实验结果 (%)
Tab. 5 Ablation experiment results on THNCNews
k-shot | P-tuning | Know | Acc | Macro_F1 |
---|---|---|---|---|
1 | × | × | 72.19 | 71.84 |
√ | × | 73.56 | 73.07 | |
× | √ | 76.91 | 76.75 | |
√ | √ | 77.12 | 76.94 | |
5 | × | × | 81.29 | 81.01 |
√ | × | 82.36 | 82.33 | |
× | √ | 82.33 | 82.14 | |
√ | √ | 82.95 | 82.78 | |
10 | × | × | 84.58 | 84.56 |
√ | × | 85.61 | 85.53 | |
× | √ | 85.56 | 85.50 | |
√ | √ | 85.50 | 85.38 | |
20 | × | × | 86.63 | 86.61 |
√ | × | 86.70 | 86.65 | |
× | √ | 86.99 | 86.94 | |
√ | √ | 87.04 | 86.96 |
1 | DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL].(2019-05-24)[2023-05-13].. |
2 | LIU Y, OTT M, GOYAL N,et al.RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. (2020-07-26)[2023-05-13]. . |
3 | RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer [J]. The Journal of Machine Learning Research, 2020, 21(1): 5485-5551. |
4 | BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 1877-1901. |
5 | SUN C, QIU X, XU Y, et al. How to fine-tune BERT for text classification?[C]// Proceedings of the 18th China National Conference on Chinese Computational Linguistics. Berlin: Springer, 2019: 194-206. |
6 | 王乾,曾诚,何鹏,等.基于RoBERTa-RCNN和注意力池化的新闻主题文本分类[J/OL].郑州大学学报(理学版):1-8 [2023-05-13].. |
WANG Q, ZENG C, HE P, et al. News topic text classification based on RoBERTa-RCNN and attention pooling [J/OL]. Journal of Zhengzhou University (Natural Science Edition):1-8 [2023-05-13].. | |
7 | OCH F J, GILDEA D, KHUDANPUR S, et al. A smorgasbord of features for statistical machine translation[C]// Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. Stroudsberg: ACL, 2004: 161-168. |
8 | ZHANG Y, NIVRE J. Transition-based dependency parsing with rich non-local features [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2011: 188-193. |
9 | LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing [J]. ACM Computing Surveys, 2023, 55(9): 195. |
10 | SCHICK T, SCHÜTZE H. Exploiting cloze-questions for few-shot text classification and natural language inference [C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Stroudsburg: ACL, 2021: 255-269. |
11 | SCAO T L, RUSH A. How many data points is a prompt worth[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2627-2636. |
12 | GAO T, FISCH A, CHEN D. Making pre-trained language models better few-shot learners[C]// Proceedings of the 59th Auunaul Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 3816-3830. |
13 | VASWANI A, SHAZEER N, PARMAR N,et al. Attention is all you need [C]// Proceedings of the 31st International Conference of Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc., 2017:6000-6010. |
14 | RONG X. word2vec Parameter learning explained [EB/OL]. (2014-11-11)[2023-05-13]. . |
15 | PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsberg: ACL,2014: 1532-1543. |
16 | JIANG Z, XU F F, ARAKI J, et al. How can we know what language models know?[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 423-438. |
17 | SHIN T, RAZEGHI Y, LOGAN IV R L, et al. AutoPrompt: eliciting knowledge from language models with automatically generated prompts[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsberg: ACL, 2020: 4222-4235. |
18 | LIU X, ZHENG Y, DU Z, et al. GPT understands, too [EB/OL]. (2021-05-18)[2023-05-13]. . |
19 | LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 4582-4597. |
20 | HAMBARDZUMYAN K, KHACHATRIAN H, MAY J. WARP: word-level adversarial reprogramming [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 4921-4933. |
21 | SCHICK T, SCHMID H, SCHÜTZE H. Automatically identifying words that can serve as labels for few-shot text classification[C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 5569-5578. |
22 | WEI J, HUANG C, VOSOUGHI S, et al. Few-shot text classification with triplet networks, data augmentation, and curriculum learning [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL,2021: 5493-5500. |
23 | MIYATO T, DAI A M, GOODFELLOW I. Adversarial training methods for semi-supervised text classification [EB/OL]. (2016-05-25)[2023-05-13].. |
24 | CHEN J, YANG Z, YANG D. MixText: linguistically-informed interpolation of hidden space for semi-supervised text classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 2147-2157. |
25 | SUN Z, FAN C, SUN X,et al. Neural semi-supervised learning for text classification under large-scale pretraining [EB/OL]. (2020-11-19)[2023-05-13]. . |
26 | 熊伟,宫禹.基于元学习的不平衡少样本情况下的文本分类研究[J]. 中文信息学报, 2022, 36(1):104-116. |
XIONG W, GONG Y. Text classification based on meta learning for unbalanced small samples[J]. Journal of Chinese Information Processing, 2022, 36(1):104-116. | |
27 | YAO H, WU Y-X, AL-SHEDIVAT M, et al. Knowledge-aware meta-learning for low-resource text classification [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 1814-1821. |
28 | SCHICK T, SCHÜTZE H. It’s not just size that matters: small language models are also few-shot learners [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2339-2352. |
29 | 于碧辉,蔡兴业,魏靖烜.基于提示学习的小样本文本分类方法[J].计算机应用,2023,43(9):2735-2740. |
YU B H, CAI X Y, WEI J X. Few-shot text classification method based on prompt learning[J]. Journal of Computer Applications,2023,43(9):2735-2740.. | |
30 | HU S, DING N, WANG H, et al. Knowledgeable prompt-tuning: incorporating knowledge into prompt verbalizer for text classification [C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2022: 2225-2240. |
31 | MENG Y, ZHANG Y, HUANG J, et al. Text classification using label names only: a language model self-training approach[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 9006-9017. |
32 | PEREZ E, KIELA D, CHO K. True few-shot learning with language models [J]. Advances in Neural Information Processing Systems, 2021, 34: 11054-11070. |
33 | CUI Y, CHE W, LIU T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514. |
34 | DING N, HU S, ZHAO W, et al. OpenPrompt: an open-source framework for prompt-learning[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg: ACL, 2022: 105-113. |
35 | LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 3045-3059. |
[1] | 魏超, 陈艳平, 王凯, 秦永彬, 黄瑞章. 基于掩码提示与门控记忆网络校准的关系抽取方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1713-1719. |
[2] | 余杭, 周艳玲, 翟梦鑫, 刘涵. 基于预训练模型与标签融合的文本分类[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 709-714. |
[3] | 王楷天, 叶青, 程春雷. 基于异构图表示的中医电子病历分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 411-417. |
[4] | 张家伟, 高冠东, 肖珂, 宋胜尊. 基于改进分层注意网络和TextCNN联合建模的暴力犯罪分级算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 403-410. |
[5] | 于碧辉, 蔡兴业, 魏靖烜. 基于提示学习的小样本文本分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2735-2740. |
[6] | 崔雨萌, 王靖亚, 刘晓文, 闫尚义, 陶知众. 融合注意力和裁剪机制的通用文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2396-2405. |
[7] | 姜钧舰, 刘达维, 刘逸凡, 任酉贵, 赵志滨. 基于孪生网络的小样本目标检测算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2325-2329. |
[8] | 杨森淇, 段旭良, 肖展, 郎松松, 李志勇. 基于ERNIE+DPCNN+BiGRU的农业新闻文本分类[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1461-1466. |
[9] | 张旭, 生龙, 张海芳, 田丰, 王巍. 基于标签混淆的院前急救文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1050-1055. |
[10] | 林呈宇, 王雷, 薛聪. 标签语义增强的弱监督文本分类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 335-342. |
[11] | 江静, 陈渝, 孙界平, 琚生根. 融合后验概率校准训练的文本分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1789-1795. |
[12] | 杨先凤, 赵家和, 李自强. 融合字注释的文本分类模型[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1317-1323. |
[13] | 杨世刚, 刘勇国. 融合语料库特征与图注意力网络的短文本分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1324-1329. |
[14] | 张海丰, 曾诚, 潘列, 郝儒松, 温超东, 何鹏. 结合BERT和特征投影网络的新闻主题文本分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1116-1124. |
[15] | 唐望径, 许斌, 仝美涵, 韩美奂, 王黎明, 钟琦. 知识图谱增强的科普文本分类模型[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1072-1078. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||