Integrating posterior probability calibration training into text classification algorithm

doi:10.11772/j.issn.1001-9081.2021091638

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (6): 1789-1795.DOI: 10.11772/j.issn.1001-9081.2021091638

Special Issue: 第十八届CCF中国信息系统及应用大会

• The 18th CCF Conference on Web Information Systems and Applications • Previous Articles Next Articles

Integrating posterior probability calibration training into text classification algorithm

Jing JIANG¹, Yu CHEN², Jieping SUN¹, Shenggen JU¹()

^1.College of Computer Science，Sichuan University，Chengdu Sichuan 610065，China
^2.College of Science and Technology，Sichuan Minzu College，Kangding Sichuan 626001，China

Received:2021-09-27 Revised:2021-11-15 Accepted:2021-11-17 Online:2022-04-15 Published:2022-06-10
Contact: Shenggen JU
About author:JIANG Jing，born in 1996，M. S. candidate. Her research interests include natural language processing，knowledge graph
CHEN Yu，born in 1974，M. S.，professor. His research interests include natural language processing，human-computer interaction.
SUN Jieping，born in 1962，M. S.，associate professor. His research interests include intelligent information processing，intelligent education.
Supported by:
National Natural Science Foundation of China(61972270);Key Research and Development Project in Sichuan Province(2019YFG0521)

融合后验概率校准训练的文本分类算法

江静¹, 陈渝², 孙界平¹, 琚生根¹()

^1.四川大学计算机学院，成都 610065
^2.四川民族学院理工学院，四川康定 626001

通讯作者: 琚生根
作者简介:江静（1996—），女，重庆人，硕士研究生，主要研究方向：自然语言处理、知识图谱
陈渝（1974—），男，四川仪陇人，教授，硕士，主要研究方向：自然语言处理、人机交互
孙界平（1962—），男，四川成都人，副教授，硕士，主要研究方向：智能信息处理、智慧教育
基金资助:
国家自然科学基金资助项目(61972270);四川省重点研发项目(2019YFG0521)

Abstract

Abstract:

The pre-training language models used for text representation have achieved high accuracy on various text classification tasks， but the following problems still remain： on the one hand， the category with the largest posterior probability is selected as the final classification result of the model after calculating the posterior probabilities on all categories in the pre-training language model. However， in many scenarios， the quality of the posterior probability itself can provide more reliable information than the final classification result. On the other hand， the classifier of the pre-training language model has performance degradation when assigning different labels to texts with similar semantics. In response to the above two problems， a model combining posterior probability calibration and negative example supervision named PosCal-negative was proposed. In PosCal-negative model， the difference between the predicted probability and the empirical posterior probability was dynamically penalized in an end-to-and way during the training process， and the texts with different labels were used to realize the negative supervision of the encoder， so that different feature vector representations were generated for different categories. Experimental results show that the classification accuracies of the proposed model on two Chinese maternal and child care text classification datasets MATINF-C-AGE and MATINF-C-TOPIC reach 91.55% and 69.19% respectively， which are 1.13 percentage points and 2.53 percentage points higher than those of Enhanced Representation through kNowledge IntEgration （ERNIE） model respectively.

Key words: text classification, posterior probability calibration, pre-training language model, negative supervision, deep learning

摘要：

用于文本表示的预训练语言模型在各种文本分类任务上实现了较高的准确率，但仍然存在以下问题：一方面，预训练语言模型在计算出所有类别的后验概率后选择后验概率最大的类别作为其最终分类结果，然而在很多场景下，后验概率的质量能比分类结果提供更多的可靠信息；另一方面，预训练语言模型的分类器在为语义相似的文本分配不同标签时会出现性能下降的情况。针对上述两个问题，提出一种后验概率校准结合负例监督的模型PosCal-negative。该模型端到端地在训练过程中动态地对预测概率和经验后验概率之间的差异进行惩罚，并在训练过程中利用带有不同标签的文本来实现对编码器的负例监督，从而为每个类别生成不同的特征向量表示。实验结果表明：PosCal-negative模型在两个中文母婴护理文本分类数据集MATINF-C-AGE和MATINF-C-TOPIC的分类准确率分别达到了91.55%和69.19%，相比ERNIE模型分别提高了1.13个百分点和2.53个百分点。

关键词: 文本分类, 后验概率校准, 预训练语言模型, 负例监督, 深度学习

CLC Number:

TP391

Jing JIANG, Yu CHEN, Jieping SUN, Shenggen JU. Integrating posterior probability calibration training into text classification algorithm[J]. Journal of Computer Applications, 2022, 42(6): 1789-1795.

江静, 陈渝, 孙界平, 琚生根. 融合后验概率校准训练的文本分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(6): 1789-1795.

Figures/Tables 8

Tab. 1 Examples of text classification using BERT on MedWeb dataset

语句	标签	BERT分类
A cold is a legit disease.	—	Cold
Oh my god！ I caught a cold！	Cold	Cold

Fig. 1 Overall framework of proposed model

Tab. 2 Examples of MATINF-C dataset

妇婴保健数据集的文本实例	类别
宝宝为什么总是吐舌头啊？	问题
我家宝宝出生快满四个月了，这几天我突然发现宝宝总是吐舌头，而且口水也很多，那么这到底是咋回事啊？	描述

Tab. 3 Hyperparameter setting

参数	AGE	TOPIC	参数	AGE	TOPIC
$λ 1$	0.7	0.5	u	5	5
$λ 2$	0.3	0.5	n	4	4

Tab. 3 Hyperparameter setting

参数	AGE	TOPIC	参数	AGE	TOPIC
$λ 1$	0.7	0.5	u	5	5
$λ 2$	0.3	0.5	n	4	4

Tab. 4 Comparison of accuracy of different models

模型		MATINF-C-AGE	MATINF-C-TOPIC
CNN 及其变种模型	Text CNN^［24］	90.95	64.41
	DCNN^［25］	90.96	64.60
	RCNN^［27］	90.81	63.56
	fastText^［28］	87.76	61.81
	DPCNN^［29］	91.02	65.92
预训练语言模型	BERT-base^［30］	90.33	66.95
	BERT-of-Theseus^［31］	90.25	66.72
	ERNIE^［31］	90.42	66.66
后验概率校准模型	Temp^［14］	90.86	68.04
后验概率校准模型	PosCal-negative	91.55	69.19

Tab. 5 Accuracy of ablation experiment

模型	MATINF-C-AGE	MATINF-C-TOPIC
BERT-base	90.33	66.95
BERT-base+PosCal	91.25	68.77
BERT-base+Negative	90.87	68.04
PosCal-negative	91.55	69.19

Tab. 6 Comparison of ECE

模型	MATINF-C-AGE	MATINF-C-TOPIC
BERT-base	0.117 976	0.116 114
Temp	0.148 775	0.139 897
PosCal-negative	0.113 868	0.105 009

Tab. 7 Accuracy comparison of negative supervision module

模型	MATINF-C-AGE	MATINF-C-TOPIC
PosCal-ACE	90.12	66.77
PosCal-AM	90.56	67.48
PosCal-negative	91.55	69.19

References 32

1	WANG S， MANNING C D. Baselines and bigrams： simple， good sentiment and topic classification［C］// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2012： 90-94.
2	WANG G Y， LI C Y， WANG W L， et al. Joint embedding of words and labels for text classification［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： ACL， 2018：2321-2331. 10.18653/v1/p18-1216
3	ZHANG X， ZHAO J B， LeCUN Y. Character-level convolutional networks for text classification［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2015： 649-657. 10.1109/icip.2015.7351229
4	SHEN D H， ZHANG Y Z， HENAO R， et al. Deconvolutional latent-variable model for text sequence matching［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2018： 5438-5445.
5	YANG P C， SUN X， LI W， et al. SGM： sequence generation model for multi-label classification［C］// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg， PA： ACL， 2018：3915-3926. 10.18653/v1/p19-1518
6	JIANG X Q， OSL M， KIM J， et al. Calibrating predictive model estimates to support personalized medicine［J］. Journal of the American Medical Informatics Association， 2012， 19（2）： 263-274. 10.1136/amiajnl-2011-000291
7	MURPHY A H. A new vector partition of the probability score［J］. Journal of Applied Meteorology and Climatology， 1973， 12（4）： 595-600. 10.1175/1520-0450(1973)012<0595:anvpot>2.0.co;2
8	MURPHY A H， WINKLER R L. Reliability of subjective probability forecasts of precipitation and temperature［J］. Journal of the Royal Statistical Society： Series C （Applied Statistics）， 1977， 26（1）： 41-47. 10.2307/2346866
9	DEGROOT M H， FIENBERG S E. The comparison and evaluation of forecasters［J］. Journal of the Royal Statistical Society： Series D （The Statistician）， 1983， 32（1/2）： 12-22. 10.2307/2987588
10	GNEITING T， RAFTERY A E. Weather forecasting with ensemble methods［J］. Science， 2005， 310（5746）： 248-249. 10.1126/science.1115255
11	BRÖCKER J. Reliability， sufficiency， and the decomposition of proper scores［J］. Quarterly Journal of the Royal Meteorological Society， 2009， 135（643）： 1512-1519. 10.1002/qj.456
12	NGUYEN K， O’CONNOR B. Posterior calibration and exploratory analysis for natural language processing models［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2015： 1587-1598. 10.18653/v1/d15-1182
13	CARD D， SMITH N A. The importance of calibration for estimating proportions from annotations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association of the Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg， PA： ACL， 2018： 1636-1646. 10.18653/v1/n18-1148
14	GUO C， PLEISS G， SUN Y， et al. On calibration of modern neural networks［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 1321-1330.
15	KUMAR A， LIANG P， MA T Y. Verified uncertainty calibration［C/OL］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. ［2021-03-30］..
16	WAKAMIYA S， MORITA M， KANO Y， et al. Overview of the NTCIR-13： MedWeb task［C］// Proceedings of the 13th NTCIR Conference on Evaluation of Information Access Technologies. Tokyo： National Institute of Informatics， 2017： 40-49.
17	刘婷婷，朱文东，刘广一. 基于深度学习的文本分类研究进展［J］. 电力信息与通信技术， 2018， 16（3）：1-7. 10.16543/j.2095-641x.electric.power.ict.2018.03.001
	LIU T T， ZHU W D， LIU G Y. Advances in deep learning based text classification［J］. Electric Power Information and Communication Technology， 2018， 16（3）：1-7. 10.16543/j.2095-641x.electric.power.ict.2018.03.001
18	何力，郑灶贤，项凤涛，等. 基于深度学习的文本分类技术研究进展［J］. 计算机工程， 2021， 47（2）：1-11. 10.19678/j.issn.1000-3428.0059099
	HE L， ZHENG Z X， XIANG F T， et al. Research progress of text classification technology based on deep learning［J］. Computer Engineering， 2021， 47（2）：1-11. 10.19678/j.issn.1000-3428.0059099
19	ZADROZNY B， ELKAN C. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers［C］// Proceedings of the 18th International Conference on Machine Learning. San Francisco： Morgan Kaufmann Publishers Inc.， 2001： 609-616. 10.1145/775047.775151
20	NAEINI M P， COOPER G F， HAUSKRECHT M. Obtaining well calibrated probabilities using Bayesian binning［C］// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2015： 2901-2907. 10.1137/1.9781611974010.24
21	PLATT J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods［M］// SMOLA A J， BARTLETT P，SCHÖLKOPF B， et al. Advances in Large Margin Classifiers. Cambridge： MIT Press， 2000： 61-74.
22	OHASHI S， TAKAYAMA J， KAJIWARA T， et al. Text classification with negative supervision［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2020： 351-357. 10.18653/v1/2020.acl-main.33
23	XU C W， PEI J X， WU H T， et al. MATINF： a jointly labeled large-scale dataset for classification， question answering and summarization［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： CL， 2020： 3586-3596. 10.18653/v1/2020.acl-main.330
24	KIM Y. Convolutional neural networks for sentence classification［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2014： 1746-1751. 10.3115/v1/d14-1181
25	KALCHBRENNER N， GREFENSTETTE E， BLUNSOM P. A convolutional neural network for modelling sentences［C］// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics （Volume 1： Long Papers）. Stroudsburg， PA： ACL， 2014： 655-665. 10.3115/v1/p14-1062
26	杜思佳，于海宁，张宏莉. 基于深度学习的文本分类研究进展［J］. 网络与信息安全学报， 2020， 6（4）：1-13. 10.11959/j.issn.2096-109x.2020010
	DU S J， YU H N， ZHANG H L. Survey of text classification methods based on deep learning［J］. Chinese Journal of Network and Information Security， 2020， 6（4）：1-13. 10.11959/j.issn.2096-109x.2020010
27	LAI S W， XU L H， LIU K， et al. Recurrent convolutional neural networks for text classification［C］// Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2015： 2267-2273. 10.1609/aaai.v33i01.33017370
28	JOULIN A， GRAVE E， BOJANOWSKI P， et al. Bag of tricks for efficient text classification［C］// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics， Volume 2 （Short Papers）. Stroudsburg， PA： ACL， 2017：427-431. 10.18653/v1/e17-2068
29	JOHNSON R， ZHANG T. Deep pyramid convolutional neural networks for text categorization［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics， Volume 1 （Long Papers）. Stroudsburg， PA： ACL， 2017： 562-570. 10.18653/v1/p17-1052
30	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019： 4171-4186. 10.18653/v1/n19-1423
31	XU C W， ZHOU W C S， GE T， et al. BERT-of-Theseus： compressing BERT by progressive module replacing［C］// Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： ACL， 2020： 7859-7869. 10.18653/v1/2020.emnlp-main.633
32	ZHANG Z Y， HAN X， LIU Z Y， et al. ERNIE： enhanced language representation with informative entities［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg， PA： ACL， 2019： 1441-1451. 10.18653/v1/p19-1139

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[3]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[4]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[5]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[6]	Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557.
[7]	Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625.
[8]	Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650.
[9]	Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263.
[10]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[11]	Zhi ZHANG, Xin LI, Naifu YE, Kaixi HU. DKP： defending against model stealing attacks based on dark knowledge protection [J]. Journal of Computer Applications, 2024, 44(7): 2080-2086.
[12]	Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318.
[13]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[14]	Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation [J]. Journal of Computer Applications, 2024, 44(7): 2208-2215.
[15]	Yajuan ZHAO, Fanjun MENG, Xingjian XU. Review of online education learner knowledge tracing [J]. Journal of Computer Applications, 2024, 44(6): 1683-1698.

Integrating posterior probability calibration training into text classification algorithm

融合后验概率校准训练的文本分类算法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 32

Related Articles 15

Recommended Articles

Metrics