计算机应用 ›› 2020, Vol. 40 ›› Issue (12): 3485-3489.DOI: 10.11772/j.issn.1001-9081.2020060914

• 2020年亚洲人工智能技术大会(ACAIT 2020) • 上一篇    下一篇

融合多头自注意力机制的中文短文本分类模型

张小川1, 戴旭尧2, 刘璐1, 冯天硕1   

  1. 1. 重庆理工大学 两江人工智能学院, 重庆 401135;
    2. 重庆理工大学 计算机科学与工程学院, 重庆 400054
  • 收稿日期:2020-06-19 修回日期:2020-08-26 出版日期:2020-12-10 发布日期:2020-10-20
  • 通讯作者: 戴旭尧(1995-),男,安徽滁州人,硕士研究生,CCF会员,主要研究方向:智能系统及应用、自然语言处理。das7575@163.com
  • 作者简介:张小川(1965-),男,四川邻水人,教授,硕士,主要研究方向:计算智能、计算机博弈、软件工程、智能机器人;刘璐(1995-),女,陕西宝鸡人,硕士研究生,CCF会员,主要研究方向:智能系统及应用、知识图谱;冯天硕(1997-),男,福建南平人,硕士研究生,主要研究方向:智能驾驶、行为决策
  • 基金资助:
    国家自然科学基金资助项目(61702063);重庆市自然科学基金资助项目(cstc2019jcyj-msxmX0544)。

Chinese short text classification model with multi-head self-attention mechanism

ZHANG Xiaochuan1, DAI Xuyao2, LIU Lu1, FENG Tianshuo1   

  1. 1. College of Liangjiang Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China;
    2. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
  • Received:2020-06-19 Revised:2020-08-26 Online:2020-12-10 Published:2020-10-20
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China(61702063), the Natural Science Foundation of Chongqing (cstc2019jcyj-msxmX0544).

摘要: 针对中文短文本缺乏上下文信息导致的语义模糊从而存在的特征稀疏问题,提出了一种融合卷积神经网络和多头自注意力机制(CNN-MHA)的文本分类模型。首先,借助现有的基于Transformer的双向编码器表示(BERT)预训练语言模型以字符级向量形式来格式化表示句子层面的短文本;然后,为降低噪声,采用多头自注意力机制(MHA)学习文本序列内部的词依赖关系并生成带有全局语义信息的隐藏层向量,再将隐藏层向量输入到卷积神经网络(CNN)中,从而生成文本分类特征向量;最后,为提升分类的优化效果,将卷积层的输出与BERT模型提取的句特征进行特征融合后输入到分类器里进行再分类。将CNN-MHA模型分别与TextCNN、BERT、TextRCNN模型进行对比,实验结果表明,改进模型在搜狐新闻数据集上的F1值表现和对比模型相比分别提高了3.99%、0.76%和2.89%,验证了改进模型的有效性。

关键词: 中文短文本, 文本分类, 多头自注意力机制, 卷积神经网络, 特征融合

Abstract: Aiming at the problem that the semantic ambiguity caused by the lack of context information in Chinese short texts results in feature sparsity, a text classification model combing Convolutional Neural Network and Multi-Head self-Attention mechanism (CNN-MHA) was proposed. Firstly, the existing Bidirectional Encoder Representations from Transformers (BERT) pre-training language model was used to format the sentence-level short texts in the form of character-level vectors. Secondly, in order to reduce the noise, the Multi-Head self-Attention mechanism (MHA) was used to learn the word dependence inside the text sequence and generate the hidden layer vector with global semantic information. Then, the hidden layer vector was input into the Convolutional Neural Network (CNN) to generate the text classification feature vector. In order to improve the optimization effect of classification, the output of convolutional layer was fused with the sentence features extracted by BERT model, and then inputted to the classifier for re-classification. Finally, the CNN-MHA model was compared with TextCNN model, BERT model and TextRCNN model respectively. Experimental results show that, the F1 performance of the improved model is increased by 3.99%, 0.76% and 2.89% respectively compared to those of the comparison models on SogouCS dataset, which proves the effectiveness of the improved model.

Key words: Chinese short text, text classification, Multi-Head self-Attention mechanism (MHA), Convolutional Neural Network (CNN), feature fusion

中图分类号: