融合后验概率校准训练的文本分类算法

doi:10.11772/j.issn.1001-9081.2021091638

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (6): 1789-1795.DOI: 10.11772/j.issn.1001-9081.2021091638

所属专题：第十八届CCF中国信息系统及应用大会

• 第十八届CCF中国信息系统及应用大会 • 上一篇下一篇

融合后验概率校准训练的文本分类算法

江静¹, 陈渝², 孙界平¹, 琚生根¹()

^1.四川大学计算机学院，成都 610065
^2.四川民族学院理工学院，四川康定 626001

收稿日期:2021-09-27 修回日期:2021-11-15 接受日期:2021-11-17 发布日期:2022-04-15 出版日期:2022-06-10
通讯作者: 琚生根
作者简介:江静（1996—），女，重庆人，硕士研究生，主要研究方向：自然语言处理、知识图谱
陈渝（1974—），男，四川仪陇人，教授，硕士，主要研究方向：自然语言处理、人机交互
孙界平（1962—），男，四川成都人，副教授，硕士，主要研究方向：智能信息处理、智慧教育
基金资助:
国家自然科学基金资助项目(61972270);四川省重点研发项目(2019YFG0521)

Integrating posterior probability calibration training into text classification algorithm

Jing JIANG¹, Yu CHEN², Jieping SUN¹, Shenggen JU¹()

^1.College of Computer Science，Sichuan University，Chengdu Sichuan 610065，China
^2.College of Science and Technology，Sichuan Minzu College，Kangding Sichuan 626001，China

Received:2021-09-27 Revised:2021-11-15 Accepted:2021-11-17 Online:2022-04-15 Published:2022-06-10
Contact: Shenggen JU
About author:JIANG Jing，born in 1996，M. S. candidate. Her research interests include natural language processing，knowledge graph
CHEN Yu，born in 1974，M. S.，professor. His research interests include natural language processing，human-computer interaction.
SUN Jieping，born in 1962，M. S.，associate professor. His research interests include intelligent information processing，intelligent education.
Supported by:
National Natural Science Foundation of China(61972270);Key Research and Development Project in Sichuan Province(2019YFG0521)

摘要/Abstract

摘要：

用于文本表示的预训练语言模型在各种文本分类任务上实现了较高的准确率，但仍然存在以下问题：一方面，预训练语言模型在计算出所有类别的后验概率后选择后验概率最大的类别作为其最终分类结果，然而在很多场景下，后验概率的质量能比分类结果提供更多的可靠信息；另一方面，预训练语言模型的分类器在为语义相似的文本分配不同标签时会出现性能下降的情况。针对上述两个问题，提出一种后验概率校准结合负例监督的模型PosCal-negative。该模型端到端地在训练过程中动态地对预测概率和经验后验概率之间的差异进行惩罚，并在训练过程中利用带有不同标签的文本来实现对编码器的负例监督，从而为每个类别生成不同的特征向量表示。实验结果表明：PosCal-negative模型在两个中文母婴护理文本分类数据集MATINF-C-AGE和MATINF-C-TOPIC的分类准确率分别达到了91.55%和69.19%，相比ERNIE模型分别提高了1.13个百分点和2.53个百分点。

关键词: 文本分类, 后验概率校准, 预训练语言模型, 负例监督, 深度学习

Abstract:

The pre-training language models used for text representation have achieved high accuracy on various text classification tasks， but the following problems still remain： on the one hand， the category with the largest posterior probability is selected as the final classification result of the model after calculating the posterior probabilities on all categories in the pre-training language model. However， in many scenarios， the quality of the posterior probability itself can provide more reliable information than the final classification result. On the other hand， the classifier of the pre-training language model has performance degradation when assigning different labels to texts with similar semantics. In response to the above two problems， a model combining posterior probability calibration and negative example supervision named PosCal-negative was proposed. In PosCal-negative model， the difference between the predicted probability and the empirical posterior probability was dynamically penalized in an end-to-and way during the training process， and the texts with different labels were used to realize the negative supervision of the encoder， so that different feature vector representations were generated for different categories. Experimental results show that the classification accuracies of the proposed model on two Chinese maternal and child care text classification datasets MATINF-C-AGE and MATINF-C-TOPIC reach 91.55% and 69.19% respectively， which are 1.13 percentage points and 2.53 percentage points higher than those of Enhanced Representation through kNowledge IntEgration （ERNIE） model respectively.

Key words: text classification, posterior probability calibration, pre-training language model, negative supervision, deep learning

中图分类号:

TP391

江静, 陈渝, 孙界平, 琚生根. 融合后验概率校准训练的文本分类算法[J]. 计算机应用, 2022, 42(6): 1789-1795.

Jing JIANG, Yu CHEN, Jieping SUN, Shenggen JU. Integrating posterior probability calibration training into text classification algorithm[J]. Journal of Computer Applications, 2022, 42(6): 1789-1795.

图/表 8

表1 MedWeb数据集上用BERT进行文本分类的例子

Tab. 1 Examples of text classification using BERT on MedWeb dataset

语句	标签	BERT分类
A cold is a legit disease.	—	Cold
Oh my god！ I caught a cold！	Cold	Cold

图1 本文模型整体框架

Fig. 1 Overall framework of proposed model

表2 MATINF-C数据集的实例

Tab. 2 Examples of MATINF-C dataset

妇婴保健数据集的文本实例	类别
宝宝为什么总是吐舌头啊？	问题
我家宝宝出生快满四个月了，这几天我突然发现宝宝总是吐舌头，而且口水也很多，那么这到底是咋回事啊？	描述

表3 超参数设置

Tab. 3 Hyperparameter setting

参数	AGE	TOPIC	参数	AGE	TOPIC
$λ 1$	0.7	0.5	u	5	5
$λ 2$	0.3	0.5	n	4	4

表3 超参数设置

Tab. 3 Hyperparameter setting

参数	AGE	TOPIC	参数	AGE	TOPIC
$λ 1$	0.7	0.5	u	5	5
$λ 2$	0.3	0.5	n	4	4

表4 各模型的准确率对比 ( %)

Tab. 4 Comparison of accuracy of different models

模型		MATINF-C-AGE	MATINF-C-TOPIC
CNN 及其变种模型	Text CNN^［24］	90.95	64.41
	DCNN^［25］	90.96	64.60
	RCNN^［27］	90.81	63.56
	fastText^［28］	87.76	61.81
	DPCNN^［29］	91.02	65.92
预训练语言模型	BERT-base^［30］	90.33	66.95
	BERT-of-Theseus^［31］	90.25	66.72
	ERNIE^［31］	90.42	66.66
后验概率校准模型	Temp^［14］	90.86	68.04
后验概率校准模型	PosCal-negative	91.55	69.19

表5 消融实验的准确率结果 ( %)

Tab. 5 Accuracy of ablation experiment

模型	MATINF-C-AGE	MATINF-C-TOPIC
BERT-base	90.33	66.95
BERT-base+PosCal	91.25	68.77
BERT-base+Negative	90.87	68.04
PosCal-negative	91.55	69.19

表6 ECE对比

Tab. 6 Comparison of ECE

模型	MATINF-C-AGE	MATINF-C-TOPIC
BERT-base	0.117 976	0.116 114
Temp	0.148 775	0.139 897
PosCal-negative	0.113 868	0.105 009

表7 负例监督模块准确率对比 ( %)

Tab. 7 Accuracy comparison of negative supervision module

模型	MATINF-C-AGE	MATINF-C-TOPIC
PosCal-ACE	90.12	66.77
PosCal-AM	90.56	67.48
PosCal-negative	91.55	69.19

参考文献 32

1	WANG S， MANNING C D. Baselines and bigrams： simple， good sentiment and topic classification［C］// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2012： 90-94.
2	WANG G Y， LI C Y， WANG W L， et al. Joint embedding of words and labels for text classification［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： ACL， 2018：2321-2331. 10.18653/v1/p18-1216
3	ZHANG X， ZHAO J B， LeCUN Y. Character-level convolutional networks for text classification［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 649-657. 10.1109/icip.2015.7351229
4	SHEN D H， ZHANG Y Z， HENAO R， et al. Deconvolutional latent-variable model for text sequence matching［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 5438-5445.
5	YANG P C， SUN X， LI W， et al. SGM： sequence generation model for multi-label classification［C］// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg， PA： ACL， 2018：3915-3926. 10.18653/v1/p19-1518
6	JIANG X Q， OSL M， KIM J， et al. Calibrating predictive model estimates to support personalized medicine［J］. Journal of the American Medical Informatics Association， 2012， 19（2）： 263-274. 10.1136/amiajnl-2011-000291
7	MURPHY A H. A new vector partition of the probability score［J］. Journal of Applied Meteorology and Climatology， 1973， 12（4）： 595-600. 10.1175/1520-0450(1973)012<0595:anvpot>2.0.co;2
8	MURPHY A H， WINKLER R L. Reliability of subjective probability forecasts of precipitation and temperature［J］. Journal of the Royal Statistical Society： Series C （Applied Statistics）， 1977， 26（1）： 41-47. 10.2307/2346866
9	DEGROOT M H， FIENBERG S E. The comparison and evaluation of forecasters［J］. Journal of the Royal Statistical Society： Series D （The Statistician）， 1983， 32（1/2）： 12-22. 10.2307/2987588
10	GNEITING T， RAFTERY A E. Weather forecasting with ensemble methods［J］. Science， 2005， 310（5746）： 248-249. 10.1126/science.1115255
11	BRÖCKER J. Reliability， sufficiency， and the decomposition of proper scores［J］. Quarterly Journal of the Royal Meteorological Society， 2009， 135（643）： 1512-1519. 10.1002/qj.456
12	NGUYEN K， O’CONNOR B. Posterior calibration and exploratory analysis for natural language processing models［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2015： 1587-1598. 10.18653/v1/d15-1182
13	CARD D， SMITH N A. The importance of calibration for estimating proportions from annotations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association of the Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： ACL， 2018： 1636-1646. 10.18653/v1/n18-1148
14	GUO C， PLEISS G， SUN Y， et al. On calibration of modern neural networks［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1321-1330.
15	KUMAR A， LIANG P， MA T Y. Verified uncertainty calibration［C/OL］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. ［2021-03-30］..
16	WAKAMIYA S， MORITA M， KANO Y， et al. Overview of the NTCIR-13： MedWeb task［C］// Proceedings of the 13th NTCIR Conference on Evaluation of Information Access Technologies. Tokyo： National Institute of Informatics， 2017： 40-49.
17	刘婷婷，朱文东，刘广一. 基于深度学习的文本分类研究进展［J］. 电力信息与通信技术， 2018， 16（3）：1-7. 10.16543/j.2095-641x.electric.power.ict.2018.03.001
	LIU T T， ZHU W D， LIU G Y. Advances in deep learning based text classification［J］. Electric Power Information and Communication Technology， 2018， 16（3）：1-7. 10.16543/j.2095-641x.electric.power.ict.2018.03.001
18	何力，郑灶贤，项凤涛，等. 基于深度学习的文本分类技术研究进展［J］. 计算机工程， 2021， 47（2）：1-11. 10.19678/j.issn.1000-3428.0059099
	HE L， ZHENG Z X， XIANG F T， et al. Research progress of text classification technology based on deep learning［J］. Computer Engineering， 2021， 47（2）：1-11. 10.19678/j.issn.1000-3428.0059099
19	ZADROZNY B， ELKAN C. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers［C］// Proceedings of the 18th International Conference on Machine Learning. San Francisco： Morgan Kaufmann Publishers Inc.， 2001： 609-616. 10.1145/775047.775151
20	NAEINI M P， COOPER G F， HAUSKRECHT M. Obtaining well calibrated probabilities using Bayesian binning［C］// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2015： 2901-2907. 10.1137/1.9781611974010.24
21	PLATT J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods［M］// SMOLA A J， BARTLETT P，SCHÖLKOPF B， et al. Advances in Large Margin Classifiers. Cambridge： MIT Press， 2000： 61-74.
22	OHASHI S， TAKAYAMA J， KAJIWARA T， et al. Text classification with negative supervision［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020： 351-357. 10.18653/v1/2020.acl-main.33
23	XU C W， PEI J X， WU H T， et al. MATINF： a jointly labeled large-scale dataset for classification， question answering and summarization［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： CL， 2020： 3586-3596. 10.18653/v1/2020.acl-main.330
24	KIM Y. Convolutional neural networks for sentence classification［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1746-1751. 10.3115/v1/d14-1181
25	KALCHBRENNER N， GREFENSTETTE E， BLUNSOM P. A convolutional neural network for modelling sentences［C］// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： ACL， 2014： 655-665. 10.3115/v1/p14-1062
26	杜思佳，于海宁，张宏莉. 基于深度学习的文本分类研究进展［J］. 网络与信息安全学报， 2020， 6（4）：1-13. 10.11959/j.issn.2096-109x.2020010
	DU S J， YU H N， ZHANG H L. Survey of text classification methods based on deep learning［J］. Chinese Journal of Network and Information Security， 2020， 6（4）：1-13. 10.11959/j.issn.2096-109x.2020010
27	LAI S W， XU L H， LIU K， et al. Recurrent convolutional neural networks for text classification［C］// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 2267-2273. 10.1609/aaai.v33i01.33017370
28	JOULIN A， GRAVE E， BOJANOWSKI P， et al. Bag of tricks for efficient text classification［C］// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics， Volume 2 （Short Papers）. Stroudsburg， PA： ACL， 2017：427-431. 10.18653/v1/e17-2068
29	JOHNSON R， ZHANG T. Deep pyramid convolutional neural networks for text categorization［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics， Volume 1 （Long Papers）. Stroudsburg， PA： ACL， 2017： 562-570. 10.18653/v1/p17-1052
30	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019： 4171-4186. 10.18653/v1/n19-1423
31	XU C W， ZHOU W C S， GE T， et al. BERT-of-Theseus： compressing BERT by progressive module replacing［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2020： 7859-7869. 10.18653/v1/2020.emnlp-main.633
32	ZHANG Z Y， HAN X， LIU Z Y， et al. ERNIE： enhanced language representation with informative entities［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2019： 1441-1451. 10.18653/v1/p19-1139

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[4]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[5]	吴相岚, 肖洋, 刘梦莹, 刘明铭. 基于语义增强模式链接的Text-to-SQL模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2689-2695.
[6]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[7]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[8]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[9]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[10]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.
[11]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[12]	张郅, 李欣, 叶乃夫, 胡凯茜. 基于暗知识保护的模型窃取防御技术DKP[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2080-2086.
[13]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[14]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[15]	孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215.

融合后验概率校准训练的文本分类算法

Integrating posterior probability calibration training into text classification algorithm

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献 32

相关文章 15

编辑推荐

Metrics