《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (4): 1050-1055.DOI: 10.11772/j.issn.1001-9081.2022020317

• 人工智能 • 上一篇    

基于标签混淆的院前急救文本分类模型

张旭1, 生龙1,2, 张海芳3, 田丰4(), 王巍1,2   

  1. 1.河北工程大学 信息与电气工程学院, 河北 邯郸 056038
    2.河北省安防信息感知与处理重点实验室(河北工程大学), 河北 邯郸 056038
    3.邯郸市紧急救援指挥中心, 河北 邯郸 056002
    4.河北工程大学 医学院, 河北 邯郸 056038
  • 收稿日期:2022-03-17 修回日期:2022-05-17 接受日期:2022-05-25 发布日期:2022-08-16 出版日期:2023-04-10
  • 通讯作者: 田丰
  • 作者简介:张旭(1996—),男,河北保定人,硕士研究生,主要研究方向:自然语言处理、深度学习;
    生龙(1982—),男,河北邯郸人,副教授,博士,CCF会员,主要研究方向:自然语言处理、机器学习;
    张海芳(1987—),女,河北邯郸人,硕士研究生,主要研究方向:急诊急救、应急救援;
    王巍(1983—),男,河北邯郸人,副教授,博士,CCF会员,主要研究方向:人工智能、城市公共安全。
  • 基金资助:
    国家自然科学基金资助项目(61802107);河北省创新能力提升计划项目(215576135D)

Pre-hospital emergency text classification model based on label confusion

Xu ZHANG1, Long SHENG1,2, Haifang ZHANG3, Feng TIAN4(), Wei WANG1,2   

  1. 1.School of Information and Electrical Engineering,Hebei University of Engineering,Handan Hebei 056038,China
    2.Hebei Key Laboratory of Security Protection Information Perception and Processing(Hebei University of Engineering),Handan Hebei 056038,China
    3.Handan Emergency Rescue Command Center,Handan Hebei 056002,China
    4.School of Medicine,Hebei University of Engineering,Handan Hebei 056038,China
  • Received:2022-03-17 Revised:2022-05-17 Accepted:2022-05-25 Online:2022-08-16 Published:2023-04-10
  • Contact: Feng TIAN
  • About author:ZHANG Xu, born in 1996, M. S. candidate. His research interests include natural language processing, deep learning.
    SHENG Long, born in 1982, Ph. D., associate professor. His research interests include natural language processing, machine learning.
    ZHANG Haifang, born in 1987, M. S. candidate. Her research interests include emergency treatment, emergency rescue.
    WANG Wei, born in 1983, Ph. D., associate professor. His research interests include artificial intelligence, urban public security.
  • Supported by:
    National Natural Science Foundation of China(61802107);Hebei Provincial Innovation Ability Promotion Program(215576135D)

摘要:

针对院前急救文本专业词汇丰富、特征稀疏和标签混淆程度大等问题,提出一种基于标签混淆模型(LCM)的文本分类模型。首先,利用BERT获得动态词向量并充分挖掘专业词汇的语义信息;然后,通过融合双向长短期记忆(BiLSTM)网络、加权卷积和注意力机制生成文本表示向量,提高模型的特征提取能力;最后,采用LCM获取文本与标签间的语义联系、标签与标签间的依赖关系,从而解决标签混淆程度大的问题。在院前急救文本和公开新闻文本数据集THUCNews上进行实验,所提模型的F1值分别达到了93.46%和97.08%,相较于TextCNN(Text Convolutional Neural Network)、BiLSTM、BiLSTM-Attention等模型分别提升了0.95%~7.01%和0.38%~2.00%。实验结果表明,所提模型能够获取专业词汇的语义信息,更加精准地提取文本特征,并能有效解决标签混淆程度大的问题,同时具有一定的泛化能力。

关键词: 文本分类, 院前急救文本, 深度学习, 加权卷积, 标签混淆模型

Abstract:

Aiming at the problems of a lot of specialized vocabulary, sparse features, and a large degree of label confusion in pre-hospital emergency text, a Label Confusion Model (LCM)-based text classification model was proposed. Firstly, Bidirectional Encoder Representation from Transformers (BERT) was used to obtain dynamic word vectors and fully exploit semantic information of specialized vocabulary. Then, the text representation vector was generated by fusing Bidirectional Long Short-Term Memory (BiLSTM) network, weighted convolution, and attention mechanism to improve the feature extraction capability of the model. Finally, LCM was used to obtain semantic associations between text and labels, and dependencies between labels to solve the problem of a large degree of label confusion. In the experiments conducted on the pre-hospital emergency text and public news text datasets, the F1 scores of the LCM-based text classification model reached 93.46% and 97.08%, respectively, which were 0.95% to 7.01% and 0.38% to 2.00% higher than those of the models such as Text Convolutional Neural Network (TextCNN), BiLSTM, and BiLSTM-Attention, respectively. Experimental results show that the proposed model can obtain the semantic information of specialized vocabulary, extract text features more accurately, and effectively solve the problem of large degree of label confusion. At the same time, the proposed model has a certain generalization ability.

Key words: text classification, text of pre-hospital emergency, deep learning, weighted convolution, Label Confusion Model (LCM)

中图分类号: