Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (8): 2396-2405.DOI: 10.11772/j.issn.1001-9081.2022071071

• Artificial intelligence • Previous Articles    

General text classification model combining attention and cropping mechanism

Yumeng CUI, Jingya WANG, Xiaowen LIU, Shangyi YAN, Zhizhong TAO   

  1. School of Information and Cyber Security,People’s Public Security University of China,Beijing 100038,China
  • Received:2022-07-23 Revised:2022-09-24 Accepted:2022-09-28 Online:2023-01-15 Published:2023-08-10
  • Contact: Jingya WANG
  • About author:CUI Yumeng, born in 1998, M. S. candidate. His research interests include named entity recognition, text classification.
    LIU Xiaowen, born in 1997, M. S. candidate. His research interests include digital image processing, neural network.
    YAN Shangyi, born in 1998, M. S. candidate. His research interests include natural language processing, text classification.
    TAO Zhizhong, born in 1997, M. S. candidate. His research interests include deep learning, image style transfer.
  • Supported by:
    National Social Science Foundation of China(20AZD114)

融合注意力和裁剪机制的通用文本分类模型

崔雨萌, 王靖亚, 刘晓文, 闫尚义, 陶知众   

  1. 中国人民公安大学 信息网络安全学院,北京 100038
  • 通讯作者: 王靖亚
  • 作者简介:崔雨萌(1998—),男,吉林长春人,硕士研究生,CCF会员,主要研究方向:命名实体识别、文本分类
    刘晓文(1997—),男,山东东平人,硕士研究生,主要研究方向:数字图像处理、神经网络
    闫尚义(1998—),男,河北保定人,硕士研究生,主要研究方向:自然语言处理、文本分类
    陶知众(1997—),男,山东临沂人,硕士研究生,主要研究方向:深度学习、图像风格转换。
  • 基金资助:
    国家社会科学基金资助项目(20AZD114)

Abstract:

Focused on the issue that current classification models are generally effective on texts of one length, and a large number of long and short texts occur in actual scenes in a mixed way, a General Long and Short Text Classification Model based on Hybrid Neural Network (GLSTCM-HNN) was proposed. Firstly, BERT (Bidirectional Encoder Representations from Transformers) was applied to encode texts dynamically. Then, convolution operations were used to extract local semantic information, and a Dual Channel ATTention mechanism (DCATT) was built to enhance key text regions. Meanwhile, Recurrent Neural Network (RNN) was utilized to capture global semantic information, and a Long Text Cropping Mechanism (LTCM) was established to filter critical texts. Finally, the extracted local and global features were fused and input into Softmax function to obtain the output category. In comparison experiments on four public datasets, compared with the baseline model (BERT-TextCNN) and the best performing comparison model BERT, GLSTCM-HNN has the F1 scores increased by up to 3.87 and 5.86 percentage points respectively. In two generality experiments on mixed texts, compared with the generality model — CNN-BiLSTM/BiGRU hybrid text classification model based on Attention (CBLGA) proposed by existing research, GLSTCM-HNN has the F1 scores increased by 6.63 and 37.22 percentage points respectively. Experimental results show that the proposed model can improve the accuracy of text classification task effectively, and has generality of classification on texts with different lengths from training data and on long and short mixed texts.

Key words: deep learning, text classification, attention mechanism, cropping mechanism, general model

摘要:

针对当前分类模型通常仅对一种长度文本有效,而在实际场景中长短文本大量混合存在的问题,提出了一种基于混合神经网络的通用型长短文本分类模型(GLSTCM-HNN)。首先,利用BERT(Bidirectional Encoder Representations from Transformers)对文本进行动态编码;然后,使用卷积操作提取局部语义信息,并构建双通道注意力机制(DCATT)对关键文本区域增强;同时,使用循环神经网络(RNN)捕获全局语义信息,并建立长文本裁剪机制(LTCM)来筛选重要文本;最后,将提取到的局部和全局特征进行融合降维,并输入到Softmax函数里以得到类别输出。在4个公开数据集上的对比实验中,与基线模型(BERT-TextCNN)和性能最优的对比模型(BERT)相比,GLSTCM-HNN的F1分数至多分别提升了3.87和5.86个百分点;在混合文本上的两组通用性实验中,GLSTCM-HNN的F1分数较已有研究提出的通用型模型——基于Attention的改进CNN-BiLSTM/BiGRU混联文本分类模型(CBLGA)分别提升了6.63和37.22个百分点。实验结果表明,所提模型能够有效提高文本分类任务的准确性,并具有在与训练数据长度不同的文本上以及在长短混合文本上分类的通用性。

关键词: 深度学习, 文本分类, 注意力机制, 裁剪机制, 通用型模型

CLC Number: