融合多头自注意力机制的中文短文本分类模型

doi:10.11772/j.issn.1001-9081.2020060914

计算机应用 ›› 2020, Vol. 40 ›› Issue (12): 3485-3489.DOI: 10.11772/j.issn.1001-9081.2020060914

• 2020年亚洲人工智能技术大会(ACAIT 2020) • 上一篇下一篇

融合多头自注意力机制的中文短文本分类模型

张小川¹, 戴旭尧², 刘璐¹, 冯天硕¹

1. 重庆理工大学两江人工智能学院, 重庆 401135;
2. 重庆理工大学计算机科学与工程学院, 重庆 400054

收稿日期:2020-06-19 修回日期:2020-08-26 发布日期:2020-10-20 出版日期:2020-12-10
通讯作者: 戴旭尧(1995-),男,安徽滁州人,硕士研究生,CCF会员,主要研究方向:智能系统及应用、自然语言处理。das7575@163.com
作者简介:张小川(1965-),男,四川邻水人,教授,硕士,主要研究方向:计算智能、计算机博弈、软件工程、智能机器人;刘璐(1995-),女,陕西宝鸡人,硕士研究生,CCF会员,主要研究方向:智能系统及应用、知识图谱;冯天硕(1997-),男,福建南平人,硕士研究生,主要研究方向:智能驾驶、行为决策
基金资助:
国家自然科学基金资助项目（61702063）；重庆市自然科学基金资助项目（cstc2019jcyj-msxmX0544）。

Chinese short text classification model with multi-head self-attention mechanism

ZHANG Xiaochuan¹, DAI Xuyao², LIU Lu¹, FENG Tianshuo¹

1. College of Liangjiang Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China;
2. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China

Received:2020-06-19 Revised:2020-08-26 Online:2020-10-20 Published:2020-12-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China（61702063）， the Natural Science Foundation of Chongqing （cstc2019jcyj-msxmX0544）.

摘要/Abstract

摘要： 针对中文短文本缺乏上下文信息导致的语义模糊从而存在的特征稀疏问题，提出了一种融合卷积神经网络和多头自注意力机制（CNN-MHA）的文本分类模型。首先，借助现有的基于Transformer的双向编码器表示（BERT）预训练语言模型以字符级向量形式来格式化表示句子层面的短文本；然后，为降低噪声，采用多头自注意力机制（MHA）学习文本序列内部的词依赖关系并生成带有全局语义信息的隐藏层向量，再将隐藏层向量输入到卷积神经网络（CNN）中，从而生成文本分类特征向量；最后，为提升分类的优化效果，将卷积层的输出与BERT模型提取的句特征进行特征融合后输入到分类器里进行再分类。将CNN-MHA模型分别与TextCNN、BERT、TextRCNN模型进行对比，实验结果表明，改进模型在搜狐新闻数据集上的F1值表现和对比模型相比分别提高了3.99%、0.76%和2.89%，验证了改进模型的有效性。

关键词: 中文短文本, 文本分类, 多头自注意力机制, 卷积神经网络, 特征融合

Abstract: Aiming at the problem that the semantic ambiguity caused by the lack of context information in Chinese short texts results in feature sparsity, a text classification model combing Convolutional Neural Network and Multi-Head self-Attention mechanism (CNN-MHA) was proposed. Firstly, the existing Bidirectional Encoder Representations from Transformers (BERT) pre-training language model was used to format the sentence-level short texts in the form of character-level vectors. Secondly, in order to reduce the noise, the Multi-Head self-Attention mechanism (MHA) was used to learn the word dependence inside the text sequence and generate the hidden layer vector with global semantic information. Then, the hidden layer vector was input into the Convolutional Neural Network (CNN) to generate the text classification feature vector. In order to improve the optimization effect of classification, the output of convolutional layer was fused with the sentence features extracted by BERT model, and then inputted to the classifier for re-classification. Finally, the CNN-MHA model was compared with TextCNN model, BERT model and TextRCNN model respectively. Experimental results show that, the F1 performance of the improved model is increased by 3.99%, 0.76% and 2.89% respectively compared to those of the comparison models on SogouCS dataset, which proves the effectiveness of the improved model.

Key words: Chinese short text, text classification, Multi-Head self-Attention mechanism (MHA), Convolutional Neural Network (CNN), feature fusion

中图分类号:

TP391.1

张小川, 戴旭尧, 刘璐, 冯天硕. 融合多头自注意力机制的中文短文本分类模型[J]. 计算机应用, 2020, 40(12): 3485-3489.

ZHANG Xiaochuan, DAI Xuyao, LIU Lu, FENG Tianshuo. Chinese short text classification model with multi-head self-attention mechanism[J]. Journal of Computer Applications, 2020, 40(12): 3485-3489.

参考文献

[1] BRINDHA S, PRABHA K, SUKUMARAN S. A survey on classification techniques for text mining[C]//Proceedings of the 20163rd International Conference on Advanced Computing and Communication Systems. Piscataway:IEEE,2016:1-5.
[2] LHAZMIR S,EL MOUDDEN I,KOBBANE A. Feature extraction based on principalcomponent analysis for text categorization[C]//Proceedings of the 2017 International Conference on Performance Evaluation and Modeling in Wired and Wireless Networks. Piscataway:IEEE,2017:1-6.
[3] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and theircompositionality[C]//Proceedings of the 201326th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2013:3111-3119.
[4] PENNINGTON J,SOCHER R,MANNING C D. Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL,2014:1532-1543.
[5] LYU F,HAN M,QIU T. Remote sensing image classification based on ensemble extreme learning machine with stacked autoencoder[J]. IEEE Access,2017,5:9021-9031.
[6] KIM Y. Convolutional neuralnetworks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL,2014:1746-1751.
[7] LAI S,XU L,LIU K,et al. Recurrent convolutional neuralnetworks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2015:2267-2273.
[8] TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memorynetworks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics/the 7th International Joint Conference on Natural Language Processing. Stroudsburg:ACL,2015:1556-1566.
[9] 李云红, 梁思程, 任劼, 等. 基于循环神经网络变体和卷积神经网络的文本分类方法[J]. 西北大学学报(自然科学版), 2019, 49(4):573-579. (LI Y H, LIANG S C, REN J, et al. Text classification method based on recurrent neuralnetwork variants and convolutional neuralnetwork[J]. Journal of Northwest University (Natural Science Edition),2019,49(4):573-579.)
[10] 邵清, 马慧萍. 融合self-attention机制的卷积神经网络文本分类模型[J]. 小型微型计算机系统, 2019, 40(6):1137-1141. (SHAO Q, MA H P. Convolutional neuralnetwork text classification model with self-attention mechanism[J]. Journal of Chinese Computer Systems,2019,40(6):1137-1141.)
[11] JOHNSON R,ZHANG T. Deep pyramid convolutional neuralnetworks for text categorization[C]//Proceedings of the 201755th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL,2017:562-570
[12] WANG B. Disconnected recurrent neuralnetworks for text categorization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2018:2311-2320.
[13] YANG Z,YANG D,DYER C,et al. Hierarchical attentionnetworks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:ACL,2016:1480-1489.
[14] DEVLIN J,CHANG M,LEE K,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:ACL,2019:4171-4186.
[15] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2017:6000-6010.
[16] 搜狗实验室. 搜狐新闻数据(SogouCS)[EB/OL].[2020-01-13]. https://www.sogou.com/labs/resource/cs.php.(Sogou Lab. Sohu news data (SogouCS)[EB/OL].[2020-01-13]. https://www.sogou.com/labs/resource/cs.php.)
[17] 卢健, 马成贤, 杨腾飞, 等. Text-CRNN+attention架构下的多类别文本信息分类[J]. 计算机应用研究, 2020, 37(6):1693-1696, 1701.(LU J,MA C X,YANG T F,et al. Multi-category text information classification with Text-CRNN+attention architecture[J]. Application Research of Computers,2020,37(6):1693-1696,1701.)

融合多头自注意力机制的中文短文本分类模型

Chinese short text classification model with multi-head self-attention mechanism

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[3]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[4]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[5]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[6]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[7]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[8]	刘瑞华, 郝子赫, 邹洋杨. 基于多层级精细特征融合的步态识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2250-2257.
[9]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[10]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[11]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[12]	刘越, 刘芳, 武奥运, 柴秋月, 王天笑. 基于自注意力机制与图卷积的3D目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1972-1977.
[13]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.
[14]	余新言, 曾诚, 王乾, 何鹏, 丁晓玉. 基于知识增强和提示学习的小样本新闻主题分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1767-1774.
[15]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.