基于标签混淆的院前急救文本分类模型

doi:10.11772/j.issn.1001-9081.2022020317

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (4): 1050-1055.DOI: 10.11772/j.issn.1001-9081.2022020317

所属专题：人工智能

基于标签混淆的院前急救文本分类模型

张旭¹, 生龙¹^,², 张海芳³, 田丰⁴(), 王巍¹^,²

^1.河北工程大学信息与电气工程学院, 河北邯郸 056038
^2.河北省安防信息感知与处理重点实验室(河北工程大学), 河北邯郸 056038
^3.邯郸市紧急救援指挥中心, 河北邯郸 056002
^4.河北工程大学医学院, 河北邯郸 056038

收稿日期:2022-03-17 修回日期:2022-05-17 接受日期:2022-05-25 发布日期:2022-08-16 出版日期:2023-04-10
通讯作者: 田丰
作者简介:张旭（1996—），男，河北保定人，硕士研究生，主要研究方向：自然语言处理、深度学习；
生龙（1982—），男，河北邯郸人，副教授，博士，CCF会员，主要研究方向：自然语言处理、机器学习；
张海芳（1987—），女，河北邯郸人，硕士研究生，主要研究方向：急诊急救、应急救援；
王巍（1983—），男，河北邯郸人，副教授，博士，CCF会员，主要研究方向：人工智能、城市公共安全。
基金资助:
国家自然科学基金资助项目(61802107);河北省创新能力提升计划项目(215576135D)

Pre-hospital emergency text classification model based on label confusion

Xu ZHANG¹, Long SHENG¹^,², Haifang ZHANG³, Feng TIAN⁴(), Wei WANG¹^,²

^1.School of Information and Electrical Engineering，Hebei University of Engineering，Handan Hebei 056038，China
^2.Hebei Key Laboratory of Security Protection Information Perception and Processing（Hebei University of Engineering），Handan Hebei 056038，China
^3.Handan Emergency Rescue Command Center，Handan Hebei 056002，China
^4.School of Medicine，Hebei University of Engineering，Handan Hebei 056038，China

Received:2022-03-17 Revised:2022-05-17 Accepted:2022-05-25 Online:2022-08-16 Published:2023-04-10
Contact: Feng TIAN
About author:ZHANG Xu， born in 1996， M. S. candidate. His research interests include natural language processing， deep learning.
SHENG Long， born in 1982， Ph. D.， associate professor. His research interests include natural language processing， machine learning.
ZHANG Haifang， born in 1987， M. S. candidate. Her research interests include emergency treatment， emergency rescue.
WANG Wei， born in 1983， Ph. D.， associate professor. His research interests include artificial intelligence， urban public security.
Supported by:
National Natural Science Foundation of China(61802107);Hebei Provincial Innovation Ability Promotion Program(215576135D)

摘要/Abstract

摘要：

针对院前急救文本专业词汇丰富、特征稀疏和标签混淆程度大等问题，提出一种基于标签混淆模型（LCM）的文本分类模型。首先，利用BERT获得动态词向量并充分挖掘专业词汇的语义信息；然后，通过融合双向长短期记忆（BiLSTM）网络、加权卷积和注意力机制生成文本表示向量，提高模型的特征提取能力；最后，采用LCM获取文本与标签间的语义联系、标签与标签间的依赖关系，从而解决标签混淆程度大的问题。在院前急救文本和公开新闻文本数据集THUCNews上进行实验，所提模型的F1值分别达到了93.46%和97.08%，相较于TextCNN（Text Convolutional Neural Network）、BiLSTM、BiLSTM-Attention等模型分别提升了0.95%~7.01%和0.38%~2.00%。实验结果表明，所提模型能够获取专业词汇的语义信息，更加精准地提取文本特征，并能有效解决标签混淆程度大的问题，同时具有一定的泛化能力。

关键词: 文本分类, 院前急救文本, 深度学习, 加权卷积, 标签混淆模型

Abstract:

Aiming at the problems of a lot of specialized vocabulary， sparse features， and a large degree of label confusion in pre-hospital emergency text， a Label Confusion Model （LCM）-based text classification model was proposed. Firstly， Bidirectional Encoder Representation from Transformers （BERT） was used to obtain dynamic word vectors and fully exploit semantic information of specialized vocabulary. Then， the text representation vector was generated by fusing Bidirectional Long Short-Term Memory （BiLSTM） network， weighted convolution， and attention mechanism to improve the feature extraction capability of the model. Finally， LCM was used to obtain semantic associations between text and labels， and dependencies between labels to solve the problem of a large degree of label confusion. In the experiments conducted on the pre-hospital emergency text and public news text datasets， the F1 scores of the LCM-based text classification model reached 93.46% and 97.08%， respectively， which were 0.95% to 7.01% and 0.38% to 2.00% higher than those of the models such as Text Convolutional Neural Network （TextCNN）， BiLSTM， and BiLSTM-Attention， respectively. Experimental results show that the proposed model can obtain the semantic information of specialized vocabulary， extract text features more accurately， and effectively solve the problem of large degree of label confusion. At the same time， the proposed model has a certain generalization ability.

Key words: text classification, text of pre-hospital emergency, deep learning, weighted convolution, Label Confusion Model (LCM)

中图分类号:

TP391

张旭, 生龙, 张海芳, 田丰, 王巍. 基于标签混淆的院前急救文本分类模型[J]. 计算机应用, 2023, 43(4): 1050-1055.

Xu ZHANG, Long SHENG, Haifang ZHANG, Feng TIAN, Wei WANG. Pre-hospital emergency text classification model based on label confusion[J]. Journal of Computer Applications, 2023, 43(4): 1050-1055.

图/表 11

图1 本文模型框架

Fig. 1 Framework of the proposed model

图2 BERT结构

Fig. 2 Structure of BERT

图3 wTextCNN结构

Fig. 3 Structure of wTextCNN

表1 数据集划分情况

Tab. 1 Classification of dataset

数据集	类别	训练集	测试集
院前急救文本	13	1 000 $×$ 13	200 $×$ 13
THUCNews	13	5 000 $×$ 13	500 $×$ 13

表1 数据集划分情况

Tab. 1 Classification of dataset

数据集	类别	训练集	测试集
院前急救文本	13	1 000 $×$ 13	200 $×$ 13
THUCNews	13	5 000 $×$ 13	500 $×$ 13

图4 不同学习率下测试集的准确率

Fig. 4 Accuracies of test sets under different learning rates

图5 不同卷积核大小下测试集的准确率

Fig. 5 Accuracies of test sets under different convolution kernel sizes

图6 不同dropout值下测试集的准确率

Fig. 6 Accuracies of test sets at different dropout values

图7 不同α值对模型收敛的影响

Fig. 7 Influence of different α values on model convergence

表2 分类结果混淆矩阵

Tab. 2 Confusion matrix of classification results

预测	真实
预测	正	反
正	TP	FP
反	FN	TN

表3 院前急救文本和THUCNews数据集上不同模型的对比实验结果 (%)

Tab. 3 Comparative experimental results of different models on pre-hospital emergency text and THUCNews datasets

数据集	模型	P	R	F₁
院前急救文本	BiLSTM	87.47	87.21	87.34
	TextCNN	87.52	87.50	87.51
	wTextCNN	88.63	88.24	88.43
	BiLSTM-Attention	88.88	88.68	88.78
	TextCNN-Attention	88.48	88.32	88.41
	wTextCNN-Attention	89.72	89.13	89.42
	BiLSTM-TextCNN	90.31	90.21	90.26
	BiLSTM-wTextCNN	91.57	91.16	91.36
	BiLSTM-wTextCNN-Attention	91.62	91.59	91.61
	ERNIE+CNN+Capsule	92.63	92.54	92.58
	本文模型	93.61	93.31	93.46
THUCNews	BiLSTM	95.23	95.13	95.18
	TextCNN	95.32	95.28	95.31
	wTextCNN	95.37	95.06	95.21
	BiLSTM-Attention	95.24	95.16	95.20
	TextCNN-Attention	95.85	95.55	95.71
	wTextCNN-Attention	96.17	96.02	96.09
	BiLSTM-TextCNN	95.74	95.58	95.66
	BiLSTM-wTextCNN	96.21	96.05	96.13
	BiLSTM-wTextCNN-Attention	96.78	96.63	96.71
	ERNIE+CNN+Capsule	96.67	96.64	96.65
	本文模型	97.12	97.05	97.08

表4 消融实验结果 (%)

Tab. 4 Ablation experiment results

模型	F₁
模型	院前急救文本	THUCNews
BERT	84.14	95.26
BERT-LCM	85.23	95.92
BiLSTM	87.35	95.13
BiLSTM-LCM	88.12	95.81
wTextCNN	88.14	95.14
wTextCNN-LCM	89.89	95.72
BiLSTM-Attention	88.63	95.15
BiLSTM-Attention-LCM	90.75	95.82
wTextCNN-Attention	89.53	96.01
wTextCNN-Attention-LCM	91.21	96.48
BiLSTM-wTextCNN	91.28	96.01
BiLSTM-wTextCNN-LCM	92.31	96.42
BiLSTM-wTextCNN-Attention-LCM	93.52	96.89

参考文献 18

1	任珍，李姝，赵静静，等. 机器学习在急诊医学中应用的研究进展及展望［J］. 中国急救医学， 2021， 41（3）：261-265. 10.3969/j.issn.1002-1949.2021.03.015
	REN Z， LI S， ZHAO J J， et al. Development and prospect of the application of machine learning in emergency medicine［J］. Chinese Journal of Critical Care Medicine， 2021， 41（3）：261-265. 10.3969/j.issn.1002-1949.2021.03.015
2	邹鼎杰. 基于知识图谱和贝叶斯分类器的图书分类［J］. 计算机工程与设计， 2020， 41（6）：1796-1801.
	ZOU D J. Book classification based on knowledge graph and Bayesian classifier［J］. Computer Engineering and Design， 2020， 41（6）：1796-1801.
3	喻航，李红莲，吕学强. 人大报告内容的文本分类［J］. 计算机工程与设计， 2021， 42（6）：1772-1778. 10.16208/j.issn1000-7024.2021.06.036
	YU H， LI H L， LYU X Q. Text classification of NPC report contents［J］. Computer Engineering and Design， 2021， 42（6）：1772-1778. 10.16208/j.issn1000-7024.2021.06.036
4	许英姿，任俊玲. 基于改进的加权补集朴素贝叶斯物流新闻分类［J］. 计算机工程与设计， 2022， 43（1）：179-185.
	XU Y Z， REN J L. Naive Bayesian logistics news classification based on improved weighted complement［J］. Computer Engineering and Design， 2022， 43（1）：179-185.
5	TOHIRA H， FINN J， BALL S， et al. Machine learning and natural language processing to identify falls in electronic patient care records from ambulance attendances［J］. Informatics for Health and Social Care， 2022， 47（4）：403-413. 10.1080/17538157.2021.2019038
6	BAGHERI A， SAMMANI A， van der HEIJDEN P G M， et al. ETM： enrichment by topic modeling for automated clinical sentence classification to detect patients’ disease history［J］. Journal of Intelligent Information Systems， 2020， 55（2）：329-349. 10.1007/s10844-020-00605-w
7	范红杰，李雪冬，叶松涛. 面向电子病历语义解析的疾病辅助诊断方法［J］. 计算机科学， 2022， 49（1）：153-158. 10.11896/jsjkx.201100125
	FAN H J， LI X D， YE S T. Aided disease diagnosis method for EMR semantic analysis［J］. Computer Science， 2022， 49（1）：153-158. 10.11896/jsjkx.201100125
8	TOPAZ M， MURGA L， BAR-BACHAR O， et al. NimbleMiner： an open-source nursing-sensitive natural language processing system based on word embedding［J］. CIN： Computers， Informatics， Nursing， 2019， 37（11）： 583-590. 10.1097/cin.0000000000000557
9	MITRA A， RAWAT B P S， McMANUS D D， et al. Relation classification for bleeding events from electronic health records： exploration of deep learning systems［J］. JMIR Medical Informatics， 2021， 9（7）： No.e27527. 10.2196/27527
10	ORMEROD M， MARTÍNEZ-DEL-RINCÓN J， ROBERTSON N， et al. Analysing representations of memory impairment in a clinical notes classification model［C］// Proceedings of the 18th BioNLP Workshop and Shared Task. Stroudsburg， PA： ACL， 2019：48-57. 10.18653/v1/w19-5005
11	吕愿愿，邓永莉，刘明亮，等. 利用实体与依存句法结构特征的病历短文本分类方法［J］. 中国医疗器械杂志， 2016， 40（4）：245-249. 10.3969/j.issn.1671-7104.2016.04.003
	LYU Y Y， DENG Y L， LIU M L， et al. Short text classification of EMR based on entities and dependency parser［J］. Chinese Journal of Medical Instrumentation， 2016， 40（4）：245-249. 10.3969/j.issn.1671-7104.2016.04.003
12	VU T， NGUYEN D Q， NGUYEN A. A label attention model for ICD coding from clinical text［C］// Proceedings of the 29th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2020：3335-3341. 10.24963/ijcai.2020/461
13	YANG P C， SUN X， LI W， et al. SGM： sequence generation model for multi-label classification［C］// Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg， PA： ACL， 2018：3915-3926.
14	MÜLLER R， KORNBLITH S， HINTON G. When does label smoothing help？［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2022-05-07］..
15	GUO B Y， HAN S Q， HAN X， et al. Label confusion learning to enhance text classification models［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto， CA： AAAI Press， 2021：12929-12936. 10.1609/aaai.v35i14.17529
16	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： ACL， 2019：4171-4186. 10.18653/v1/n18-2
17	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010.
18	LI X， CUI M L， LI J P， et al. A hybrid medical text classification framework： integrating attentive rule construction and neural network［J］. Neurocomputing， 2021， 443： 345-355. 10.1016/j.neucom.2021.02.069

[1]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[2]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[3]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[4]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[5]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[6]	刘禹含, 吉根林, 张红苹. 基于骨架图与混合注意力的视频行人异常检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2551-2557.
[7]	顾焰杰, 张英俊, 刘晓倩, 周围, 孙威. 基于时空多图融合的交通流量预测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2618-2625.
[8]	石乾宏, 杨燕, 江永全, 欧阳小草, 范武波, 陈强, 姜涛, 李媛. 面向空气质量预测的多粒度突变拟合网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2643-2650.
[9]	吴筝, 程志友, 汪真天, 汪传建, 王胜, 许辉. 基于深度学习的患者麻醉复苏过程中的头部运动幅度分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2258-2263.
[10]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[11]	张郅, 李欣, 叶乃夫, 胡凯茜. 基于暗知识保护的模型窃取防御技术DKP[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2080-2086.
[12]	赵亦群, 张志禹, 董雪. 基于密集残差物理信息神经网络的各向异性旅行时计算方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2310-2318.
[13]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[14]	孙逊, 冯睿锋, 陈彦如. 基于深度与实例分割融合的单目3D目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2208-2215.
[15]	余新言, 曾诚, 王乾, 何鹏, 丁晓玉. 基于知识增强和提示学习的小样本新闻主题分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1767-1774.

基于标签混淆的院前急救文本分类模型

Pre-hospital emergency text classification model based on label confusion

RichHTML

PDF

PDF (Mobile)

可视化

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 18

相关文章 15

编辑推荐

Metrics