所属专题: 人工智能
• • 下一篇
余新言,曾诚,王乾,何鹏,丁晓玉
收稿日期:2023-06-05
修回日期:2023-11-15
接受日期:2023-12-05
发布日期:2026-02-05
出版日期:2024-06-10
通讯作者:
曾诚
基金资助:
Received:2023-06-05
Revised:2023-11-15
Accepted:2023-12-05
Online:2026-02-05
Published:2024-06-10
摘要: 摘 要: 基于预训练微调的分类方法通常需要大量带标注的数据,导致无法在小样本分类任务上使用。因此,针对中文小样本新闻主题分类任务,提出一种基于知识增强和提示学习的分类方法(KPL)。首先,利用预训练模型在训练集上学习到最优的提示模板,;然后其次,将提示模板与输入文本结合,使分类任务转化为完形填空任务;同时 利用外部知识来扩充标签词空间,丰富标签词的语义信息;最后,对预测的标签词与原始的标签进行映射。通过在对经过抽样形成的三个新闻数据集 THUCNews、SHNews、Toutiao三个新闻数据集 分别进行实验,实验结果表明,所提方法在上述数据集上的1-shot、5-shot、10-shot、20-shot任务上整体表现有所提升,尤其在1-shot任务上提升效果突出,与基线小样本分类方法相比,准确率分别提高了7.95、2.11和3.1百分点,验证了知识增强和提示学习在小样本新闻主题分类任务上的有效性。
中图分类号:
余新言 曾诚 王乾 何鹏 丁晓玉. 基于知识增强和提示学习的小样本新闻主题分类方法[J]. 计算机应用, DOI: 10.11772/j.issn.1001-9081.2023050709.
| 数据集 | 标签 | 标签词集 |
|---|---|---|
| THUCNews | 房地产 | 房地产,房产,房地产业 |
| 金融 | 金银,金融业,金融市场 | |
| 教育 | 高考,考生,高中,文综 | |
| Toutiao | 电竞 | 网络游戏,竞技,玩家, |
| 农业 | 第一产业,农林牧副渔,农林 | |
| 证券 | 出游,旅行,出行 | |
| SHNews | 科技 | 高科技,高新技术,技术 |
| 文化 | 文明,人文,文风 | |
| 旅游 | 出游,旅行,出行 |
表1 各数据集标签词集示例
Tab.1 Examples of label word sets for different datasets
| 数据集 | 标签 | 标签词集 |
|---|---|---|
| THUCNews | 房地产 | 房地产,房产,房地产业 |
| 金融 | 金银,金融业,金融市场 | |
| 教育 | 高考,考生,高中,文综 | |
| Toutiao | 电竞 | 网络游戏,竞技,玩家, |
| 农业 | 第一产业,农林牧副渔,农林 | |
| 证券 | 出游,旅行,出行 | |
| SHNews | 科技 | 高科技,高新技术,技术 |
| 文化 | 文明,人文,文风 | |
| 旅游 | 出游,旅行,出行 |
| 数据集 | 样本数 | 标签 类别数 | ||
|---|---|---|---|---|
| 训练集 | 验证集 | 测试集 | ||
| THUCNews | 180 000 | 10 000 | 10 000 | 10 |
| Toutiao | 267 877 | 57 401 | 57 401 | 15 |
| SHNews | 22 699 | 5 764 | 5 755 | 12 |
表2 实验中使用的数据集统计信息
Tab.2 Statistics of experiment datasets
| 数据集 | 样本数 | 标签 类别数 | ||
|---|---|---|---|---|
| 训练集 | 验证集 | 测试集 | ||
| THUCNews | 180 000 | 10 000 | 10 000 | 10 |
| Toutiao | 267 877 | 57 401 | 57 401 | 15 |
| SHNews | 22 699 | 5 764 | 5 755 | 12 |
| 数据集 | 模板 |
|---|---|
| THUCNews | 这是一条[MASK]新闻:x |
| [MASK]新闻:x | |
| x是[MASK]新闻 | |
| Toutiao | 这是一条[MASK]新闻:x |
| [MASK]新闻:x | |
| 分类:[MASK]x | |
| SHNews | 这是一条[MASK]新闻:x |
| [MASK]新闻:x | |
| 主题:[MASK]x |
表3 实验数据集的中文模板
Tab.3 Chinese templates for experiment datasets
| 数据集 | 模板 |
|---|---|
| THUCNews | 这是一条[MASK]新闻:x |
| [MASK]新闻:x | |
| x是[MASK]新闻 | |
| Toutiao | 这是一条[MASK]新闻:x |
| [MASK]新闻:x | |
| 分类:[MASK]x | |
| SHNews | 这是一条[MASK]新闻:x |
| [MASK]新闻:x | |
| 主题:[MASK]x |
| k-shot | 模型 | THUCNews | Toutiao | SHNews | |||
|---|---|---|---|---|---|---|---|
| Acc | Macro_F1 | Acc | Macro_F1 | Acc | Macro_F1 | ||
| 1 | FT | 48.27 | 45.90 | 28.86 | 28.71 | 28.78 | 26.79 |
| Soft-verb | 67.29 | 66.62 | 63.91 | 58.44 | 55.98 | 54.31 | |
| Auto-verb | 34.38 | 31.98 | 37.11 | 31.27 | 30.36 | 27.50 | |
| PET | 69.53 | 68.92 | 59.84 | 55.07 | 56.28 | 54.48 | |
| Soft-prompt | 65.00 | 64.22 | 61.00 | 55.66 | 47.69 | 46.41 | |
| KPL | 77.12 | 76.94 | 67.01 | 61.68 | 58.39 | 56.46 | |
| 5 | FT | 78.72 | 78.67 | 67.94 | 68.08 | 57.72 | 58.28 |
| Soft-verb | 81.97 | 81.77 | 73.04 | 67.03 | 65.67 | 65.34 | |
| Auto-verb | 76.47 | 75.52 | 68.58 | 62.84 | 58.78 | 58.45 | |
| PET | 82.23 | 82.06 | 72.58 | 66.87 | 65.48 | 65.27 | |
| Soft-prompt | 80.71 | 80.26 | 73.06 | 67.20 | 65.13 | 64.78 | |
| KPL | 82.95 | 82.78 | 74.02 | 68.04 | 66.08 | 65.80 | |
| 10 | FT | 81.53 | 81.43 | 72.16 | 72.74 | 64.29 | 64.44 |
| Soft-verb | 84.35 | 84.30 | 74.97 | 69.09 | 67.51 | 67.37 | |
| Auto-verb | 80.72 | 79.72 | 73.84 | 67.68 | 64.98 | 64.46 | |
| PET | 84.16 | 84.07 | 74.58 | 68.95 | 67.85 | 67.66 | |
| Soft-prompt | 84.95 | 84.92 | 75.00 | 68.87 | 66.36 | 65.88 | |
| KPL | 85.50 | 85.38 | 75.21 | 69.38 | 68.60 | 68.54 | |
| 20 | FT | 83.88 | 83.89 | 74.88 | 75.15 | 66.36 | 66.53 |
| Soft-verb | 86.19 | 86.12 | 76.55 | 70.57 | 69.30 | 69.26 | |
| Auto-verb | 81.78 | 79.95 | 76.50 | 70.19 | 68.15 | 67.81 | |
| PET | 86.67 | 86.60 | 75.70 | 69.72 | 68.82 | 68.91 | |
| Soft-prompt | 86.48 | 86.45 | 76.73 | 71.09 | 69.59 | 69.47 | |
| KPL | 87.04 | 86.96 | 77.21 | 71.24 | 70.31 | 70.16 | |
表4 不同数据集在1/5/10/20-shot文本分类实验结果 (%)
Tab.4 Experiment results of 1/5/10/20-shot text classification on different datasets
| k-shot | 模型 | THUCNews | Toutiao | SHNews | |||
|---|---|---|---|---|---|---|---|
| Acc | Macro_F1 | Acc | Macro_F1 | Acc | Macro_F1 | ||
| 1 | FT | 48.27 | 45.90 | 28.86 | 28.71 | 28.78 | 26.79 |
| Soft-verb | 67.29 | 66.62 | 63.91 | 58.44 | 55.98 | 54.31 | |
| Auto-verb | 34.38 | 31.98 | 37.11 | 31.27 | 30.36 | 27.50 | |
| PET | 69.53 | 68.92 | 59.84 | 55.07 | 56.28 | 54.48 | |
| Soft-prompt | 65.00 | 64.22 | 61.00 | 55.66 | 47.69 | 46.41 | |
| KPL | 77.12 | 76.94 | 67.01 | 61.68 | 58.39 | 56.46 | |
| 5 | FT | 78.72 | 78.67 | 67.94 | 68.08 | 57.72 | 58.28 |
| Soft-verb | 81.97 | 81.77 | 73.04 | 67.03 | 65.67 | 65.34 | |
| Auto-verb | 76.47 | 75.52 | 68.58 | 62.84 | 58.78 | 58.45 | |
| PET | 82.23 | 82.06 | 72.58 | 66.87 | 65.48 | 65.27 | |
| Soft-prompt | 80.71 | 80.26 | 73.06 | 67.20 | 65.13 | 64.78 | |
| KPL | 82.95 | 82.78 | 74.02 | 68.04 | 66.08 | 65.80 | |
| 10 | FT | 81.53 | 81.43 | 72.16 | 72.74 | 64.29 | 64.44 |
| Soft-verb | 84.35 | 84.30 | 74.97 | 69.09 | 67.51 | 67.37 | |
| Auto-verb | 80.72 | 79.72 | 73.84 | 67.68 | 64.98 | 64.46 | |
| PET | 84.16 | 84.07 | 74.58 | 68.95 | 67.85 | 67.66 | |
| Soft-prompt | 84.95 | 84.92 | 75.00 | 68.87 | 66.36 | 65.88 | |
| KPL | 85.50 | 85.38 | 75.21 | 69.38 | 68.60 | 68.54 | |
| 20 | FT | 83.88 | 83.89 | 74.88 | 75.15 | 66.36 | 66.53 |
| Soft-verb | 86.19 | 86.12 | 76.55 | 70.57 | 69.30 | 69.26 | |
| Auto-verb | 81.78 | 79.95 | 76.50 | 70.19 | 68.15 | 67.81 | |
| PET | 86.67 | 86.60 | 75.70 | 69.72 | 68.82 | 68.91 | |
| Soft-prompt | 86.48 | 86.45 | 76.73 | 71.09 | 69.59 | 69.47 | |
| KPL | 87.04 | 86.96 | 77.21 | 71.24 | 70.31 | 70.16 | |
| k-shot | P-tuning | Know | Acc | Macro_F1 |
|---|---|---|---|---|
| 1 | × | × | 72.19 | 71.84 |
| √ | × | 73.56 | 73.07 | |
| × | √ | 76.91 | 76.75 | |
| √ | √ | 77.12 | 76.94 | |
| 5 | × | × | 81.29 | 81.01 |
| √ | × | 82.36 | 82.33 | |
| × | √ | 82.33 | 82.14 | |
| √ | √ | 82.95 | 82.78 | |
| 10 | × | × | 84.58 | 84.56 |
| √ | × | 85.61 | 85.53 | |
| × | √ | 85.56 | 85.50 | |
| √ | √ | 85.50 | 85.38 | |
| 20 | × | × | 86.63 | 86.61 |
| √ | × | 86.70 | 86.65 | |
| × | √ | 86.99 | 86.94 | |
| √ | √ | 87.04 | 86.96 |
表5 在THUCNews上的消融实验结果 (%)
Tab. 5 Ablation experiment results on THNCNews
| k-shot | P-tuning | Know | Acc | Macro_F1 |
|---|---|---|---|---|
| 1 | × | × | 72.19 | 71.84 |
| √ | × | 73.56 | 73.07 | |
| × | √ | 76.91 | 76.75 | |
| √ | √ | 77.12 | 76.94 | |
| 5 | × | × | 81.29 | 81.01 |
| √ | × | 82.36 | 82.33 | |
| × | √ | 82.33 | 82.14 | |
| √ | √ | 82.95 | 82.78 | |
| 10 | × | × | 84.58 | 84.56 |
| √ | × | 85.61 | 85.53 | |
| × | √ | 85.56 | 85.50 | |
| √ | √ | 85.50 | 85.38 | |
| 20 | × | × | 86.63 | 86.61 |
| √ | × | 86.70 | 86.65 | |
| × | √ | 86.99 | 86.94 | |
| √ | √ | 87.04 | 86.96 |
| 1 | DEVLIN J, CHANG M-W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL].(2019-05-24)[2023-05-13].. |
| 2 | LIU Y, OTT M, GOYAL N,et al.RoBERTa: a robustly optimized BERT pretraining approach [EB/OL]. (2020-07-26)[2023-05-13]. . |
| 3 | RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer [J]. The Journal of Machine Learning Research, 2020, 21(1): 5485-5551. |
| 4 | BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 1877-1901. |
| 5 | SUN C, QIU X, XU Y, et al. How to fine-tune BERT for text classification?[C]// Proceedings of the 18th China National Conference on Chinese Computational Linguistics. Berlin: Springer, 2019: 194-206. |
| 6 | 王乾,曾诚,何鹏,等.基于RoBERTa-RCNN和注意力池化的新闻主题文本分类[J/OL].郑州大学学报(理学版):1-8 [2023-05-13].. |
| WANG Q, ZENG C, HE P, et al. News topic text classification based on RoBERTa-RCNN and attention pooling [J/OL]. Journal of Zhengzhou University (Natural Science Edition):1-8 [2023-05-13].. | |
| 7 | OCH F J, GILDEA D, KHUDANPUR S, et al. A smorgasbord of features for statistical machine translation[C]// Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. Stroudsberg: ACL, 2004: 161-168. |
| 8 | ZHANG Y, NIVRE J. Transition-based dependency parsing with rich non-local features [C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2011: 188-193. |
| 9 | LIU P, YUAN W, FU J, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing [J]. ACM Computing Surveys, 2023, 55(9): 195. |
| 10 | SCHICK T, SCHÜTZE H. Exploiting cloze-questions for few-shot text classification and natural language inference [C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Stroudsburg: ACL, 2021: 255-269. |
| 11 | SCAO T L, RUSH A. How many data points is a prompt worth[C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2627-2636. |
| 12 | GAO T, FISCH A, CHEN D. Making pre-trained language models better few-shot learners[C]// Proceedings of the 59th Auunaul Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 3816-3830. |
| 13 | VASWANI A, SHAZEER N, PARMAR N,et al. Attention is all you need [C]// Proceedings of the 31st International Conference of Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc., 2017:6000-6010. |
| 14 | RONG X. word2vec Parameter learning explained [EB/OL]. (2014-11-11)[2023-05-13]. . |
| 15 | PENNINGTON J, SOCHER R, MANNING C. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsberg: ACL,2014: 1532-1543. |
| 16 | JIANG Z, XU F F, ARAKI J, et al. How can we know what language models know?[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 423-438. |
| 17 | SHIN T, RAZEGHI Y, LOGAN IV R L, et al. AutoPrompt: eliciting knowledge from language models with automatically generated prompts[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsberg: ACL, 2020: 4222-4235. |
| 18 | LIU X, ZHENG Y, DU Z, et al. GPT understands, too [EB/OL]. (2021-05-18)[2023-05-13]. . |
| 19 | LI X L, LIANG P. Prefix-tuning: optimizing continuous prompts for generation [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 4582-4597. |
| 20 | HAMBARDZUMYAN K, KHACHATRIAN H, MAY J. WARP: word-level adversarial reprogramming [C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Stroudsburg: ACL, 2021: 4921-4933. |
| 21 | SCHICK T, SCHMID H, SCHÜTZE H. Automatically identifying words that can serve as labels for few-shot text classification[C]// Proceedings of the 28th International Conference on Computational Linguistics. [S.l.]: International Committee on Computational Linguistics, 2020: 5569-5578. |
| 22 | WEI J, HUANG C, VOSOUGHI S, et al. Few-shot text classification with triplet networks, data augmentation, and curriculum learning [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL,2021: 5493-5500. |
| 23 | MIYATO T, DAI A M, GOODFELLOW I. Adversarial training methods for semi-supervised text classification [EB/OL]. (2016-05-25)[2023-05-13].. |
| 24 | CHEN J, YANG Z, YANG D. MixText: linguistically-informed interpolation of hidden space for semi-supervised text classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 2147-2157. |
| 25 | SUN Z, FAN C, SUN X,et al. Neural semi-supervised learning for text classification under large-scale pretraining [EB/OL]. (2020-11-19)[2023-05-13]. . |
| 26 | 熊伟,宫禹.基于元学习的不平衡少样本情况下的文本分类研究[J]. 中文信息学报, 2022, 36(1):104-116. |
| XIONG W, GONG Y. Text classification based on meta learning for unbalanced small samples[J]. Journal of Chinese Information Processing, 2022, 36(1):104-116. | |
| 27 | YAO H, WU Y-X, AL-SHEDIVAT M, et al. Knowledge-aware meta-learning for low-resource text classification [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 1814-1821. |
| 28 | SCHICK T, SCHÜTZE H. It’s not just size that matters: small language models are also few-shot learners [C]// Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2021: 2339-2352. |
| 29 | 于碧辉,蔡兴业,魏靖烜.基于提示学习的小样本文本分类方法[J].计算机应用,2023,43(9):2735-2740. |
| YU B H, CAI X Y, WEI J X. Few-shot text classification method based on prompt learning[J]. Journal of Computer Applications,2023,43(9):2735-2740.. | |
| 30 | HU S, DING N, WANG H, et al. Knowledgeable prompt-tuning: incorporating knowledge into prompt verbalizer for text classification [C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2022: 2225-2240. |
| 31 | MENG Y, ZHANG Y, HUANG J, et al. Text classification using label names only: a language model self-training approach[C]// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2020: 9006-9017. |
| 32 | PEREZ E, KIELA D, CHO K. True few-shot learning with language models [J]. Advances in Neural Information Processing Systems, 2021, 34: 11054-11070. |
| 33 | CUI Y, CHE W, LIU T, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514. |
| 34 | DING N, HU S, ZHAO W, et al. OpenPrompt: an open-source framework for prompt-learning[C]// Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Stroudsburg: ACL, 2022: 105-113. |
| 35 | LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2021: 3045-3059. |
| [1] | 王静, 刘嘉星, 宋婉莹, 薛嘉兴, 丁温欣. 基于空间变换网络和特征分布校准的小样本皮肤图像分类模型[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2720-2726. |
| [2] | 王祉苑, 彭涛, 杨捷. 分布外检测中训练与测试的内外数据整合[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2497-2506. |
| [3] | 白瑞峰, 苟光磊, 文浪, 缪宛谕. 基于粒球原型网络的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2269-2277. |
| [4] | 崔双双, 王宏志, 朱加昊, 吴昊. 面向低能耗高性能的分类器两阶段数据选择方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1703-1711. |
| [5] | 余明峰, 秦永彬, 黄瑞章, 陈艳平, 林川. 基于对比学习增强双注意力机制的多标签文本分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1732-1740. |
| [6] | 李翔宇, 陈景强. 科研论文的可比性评估与比较性引文生成方法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1888-1894. |
| [7] | 曾碧卿, 钟广彬, 温志庆. 基于分解式模糊跨度的小样本命名实体识别[J]. 《计算机应用》唯一官方网站, 2025, 45(5): 1504-1510. |
| [8] | 孙海涛, 林佳瑜, 梁祖红, 郭洁. 结合标签混淆的中文文本分类数据增强技术[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1113-1119. |
| [9] | 孙熠衡, 刘茂福. 基于知识提示微调的标书信息抽取方法[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1169-1176. |
| [10] | 严一钦, 罗川, 李天瑞, 陈红梅. 基于关系网络和Vision Transformer的跨域小样本分类模型[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1095-1103. |
| [11] | 李嘉欣, 莫思特. 基于MiniRBT-LSTM-GAT与标签平滑的台区电力工单分类[J]. 《计算机应用》唯一官方网站, 2025, 45(4): 1356-1362. |
| [12] | 马灿, 黄瑞章, 任丽娜, 白瑞娜, 伍瑶瑶. 基于大语言模型的多输入中文拼写纠错方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 849-855. |
| [13] | 杨燕, 叶枫, 许栋, 张雪洁, 徐津. 融合大语言模型和提示学习的数字孪生水利知识图谱构建[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 785-793. |
| [14] | 严雪文, 黄章进. 基于对比学习的小样本图像分类方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 383-391. |
| [15] | 富坤, 应世聪, 郑婷婷, 屈佳捷, 崔静远, 李建伟. 面向小样本节点分类的图数据增强方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 392-402. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||