Chinese short text classification model with multi-head self-attention mechanism
ZHANG Xiaochuan1, DAI Xuyao2, LIU Lu1, FENG Tianshuo1
1. College of Liangjiang Artificial Intelligence, Chongqing University of Technology, Chongqing 401135, China; 2. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
Abstract:Aiming at the problem that the semantic ambiguity caused by the lack of context information in Chinese short texts results in feature sparsity, a text classification model combing Convolutional Neural Network and Multi-Head self-Attention mechanism (CNN-MHA) was proposed. Firstly, the existing Bidirectional Encoder Representations from Transformers (BERT) pre-training language model was used to format the sentence-level short texts in the form of character-level vectors. Secondly, in order to reduce the noise, the Multi-Head self-Attention mechanism (MHA) was used to learn the word dependence inside the text sequence and generate the hidden layer vector with global semantic information. Then, the hidden layer vector was input into the Convolutional Neural Network (CNN) to generate the text classification feature vector. In order to improve the optimization effect of classification, the output of convolutional layer was fused with the sentence features extracted by BERT model, and then inputted to the classifier for re-classification. Finally, the CNN-MHA model was compared with TextCNN model, BERT model and TextRCNN model respectively. Experimental results show that, the F1 performance of the improved model is increased by 3.99%, 0.76% and 2.89% respectively compared to those of the comparison models on SogouCS dataset, which proves the effectiveness of the improved model.
张小川, 戴旭尧, 刘璐, 冯天硕. 融合多头自注意力机制的中文短文本分类模型[J]. 计算机应用, 2020, 40(12): 3485-3489.
ZHANG Xiaochuan, DAI Xuyao, LIU Lu, FENG Tianshuo. Chinese short text classification model with multi-head self-attention mechanism. Journal of Computer Applications, 2020, 40(12): 3485-3489.
[1] BRINDHA S, PRABHA K, SUKUMARAN S. A survey on classification techniques for text mining[C]//Proceedings of the 20163rd International Conference on Advanced Computing and Communication Systems. Piscataway:IEEE,2016:1-5. [2] LHAZMIR S,EL MOUDDEN I,KOBBANE A. Feature extraction based on principalcomponent analysis for text categorization[C]//Proceedings of the 2017 International Conference on Performance Evaluation and Modeling in Wired and Wireless Networks. Piscataway:IEEE,2017:1-6. [3] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and theircompositionality[C]//Proceedings of the 201326th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2013:3111-3119. [4] PENNINGTON J,SOCHER R,MANNING C D. Glove:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL,2014:1532-1543. [5] LYU F,HAN M,QIU T. Remote sensing image classification based on ensemble extreme learning machine with stacked autoencoder[J]. IEEE Access,2017,5:9021-9031. [6] KIM Y. Convolutional neuralnetworks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:ACL,2014:1746-1751. [7] LAI S,XU L,LIU K,et al. Recurrent convolutional neuralnetworks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2015:2267-2273. [8] TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memorynetworks[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics/the 7th International Joint Conference on Natural Language Processing. Stroudsburg:ACL,2015:1556-1566. [9] 李云红, 梁思程, 任劼, 等. 基于循环神经网络变体和卷积神经网络的文本分类方法[J]. 西北大学学报(自然科学版), 2019, 49(4):573-579. (LI Y H, LIANG S C, REN J, et al. Text classification method based on recurrent neuralnetwork variants and convolutional neuralnetwork[J]. Journal of Northwest University (Natural Science Edition),2019,49(4):573-579.) [10] 邵清, 马慧萍. 融合self-attention机制的卷积神经网络文本分类模型[J]. 小型微型计算机系统, 2019, 40(6):1137-1141. (SHAO Q, MA H P. Convolutional neuralnetwork text classification model with self-attention mechanism[J]. Journal of Chinese Computer Systems,2019,40(6):1137-1141.) [11] JOHNSON R,ZHANG T. Deep pyramid convolutional neuralnetworks for text categorization[C]//Proceedings of the 201755th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL,2017:562-570 [12] WANG B. Disconnected recurrent neuralnetworks for text categorization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:ACL, 2018:2311-2320. [13] YANG Z,YANG D,DYER C,et al. Hierarchical attentionnetworks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:ACL,2016:1480-1489. [14] DEVLIN J,CHANG M,LEE K,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:ACL,2019:4171-4186. [15] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc.,2017:6000-6010. [16] 搜狗实验室. 搜狐新闻数据(SogouCS)[EB/OL].[2020-01-13]. https://www.sogou.com/labs/resource/cs.php.(Sogou Lab. Sohu news data (SogouCS)[EB/OL].[2020-01-13]. https://www.sogou.com/labs/resource/cs.php.) [17] 卢健, 马成贤, 杨腾飞, 等. Text-CRNN+attention架构下的多类别文本信息分类[J]. 计算机应用研究, 2020, 37(6):1693-1696, 1701.(LU J,MA C X,YANG T F,et al. Multi-category text information classification with Text-CRNN+attention architecture[J]. Application Research of Computers,2020,37(6):1693-1696,1701.)