融合字注释的文本分类模型

doi:10.11772/j.issn.1001-9081.2021030489

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (5): 1317-1323.DOI: 10.11772/j.issn.1001-9081.2021030489

• 人工智能 • 下一篇

融合字注释的文本分类模型

杨先凤¹(), 赵家和¹, 李自强²

^1.西南石油大学计算机科学学院，成都 610500
^2.四川师范大学影视与传媒学院，成都 610066

收稿日期:2021-03-31 修回日期:2021-07-08 接受日期:2021-07-21 发布日期:2022-06-11 出版日期:2022-05-10
通讯作者: 杨先凤
作者简介:杨先凤（1974—），女，四川南部人，教授，硕士，主要研究方向：计算机图像处理、智慧教育565695835@qq.com
赵家和（1997—），男，陕西渭南人，硕士研究生，主要研究方向：自然语言处理
李自强（1970—），四川青神人，教授，博士，CCF会员，主要研究方向：机器学习、智慧教育。
基金资助:
国家自然科学基金资助项目(61802321);四川省科技厅重点研发计划项目(2020YFN0019)

Text classification model combining word annotations

Xianfeng YANG¹(), Jiahe ZHAO¹, Ziqiang LI²

^1.School of Computer Science，Southwest Petroleum University，Chengdu Sichuan 610500，China
^2.College of Movie and Media，Sichuan Normal University，Chengdu Sichuan 610066，China

Received:2021-03-31 Revised:2021-07-08 Accepted:2021-07-21 Online:2022-06-11 Published:2022-05-10
Contact: Xianfeng YANG
About author:YANG Xianfeng，born in 1974，M. S.，professor. Her researchinterests include computer image processing，wisdom education.
ZHAO Jiahe， born in 1997， M. S. candidate. His researchinterests include natural language processing.
LI Ziqiang，born in 1970，Ph. D.，professor. His research interestsinclude machine learning，wisdom education.
Supported by:
National Natural Science Foundation of China(61802321);Key Research and Development Program of Science and Technology Department of Sichuan Province(2020YFN0019)

摘要/Abstract

摘要：

针对传统文本特征表示方法无法充分解决一词多义的问题，构建了一种融合字注释的文本分类模型。首先，借助现有中文字典，获取文本由字上下文选取的字典注释，并对其进行Transformer的双向编码器（BERT）编码来生成注释句向量；然后，将注释句向量与字嵌入向量融合作为输入层，并用来丰富输入文本的特征信息；最后，通过双向门控循环单元（BiGRU）学习文本的特征信息，并引入注意力机制突出关键特征向量。在公开数据集THUCNews和新浪微博情感分类数据集上进行的文本分类的实验结果表明，融合BERT字注释的文本分类模型相较未引入字注释的文本分类模型在性能上有显著提高，且在所有文本分类的实验模型中，所提出的BERT字注释_BiGRU_Attention模型有最高的精确率和召回率，能反映整体性能的F1-Score则分别高达98.16%和96.52%。

关键词: 一词多义, 字注释, 基于Transformer的双向编码器, 双向门控循环单元, 注意力机制, 文本分类

Abstract:

The traditional text feature representation method cannot fully solve the polysemy problem of word. In order to solve the problem， a new text classification model combining word annotations was proposed. Firstly， by using the existing Chinese dictionary， the dictionary annotations of the text selected by the word context were obtained， and the Bidirectional Encoder Representations from Transformers （BERT） encoding was performed on them to generate the annotated sentence vectors. Then， the annotated sentence vectors were integrated with the word embedding vectors as the input layer to enrich the characteristic information of the input text. Finally， the Bidirectional Gated Recurrent Unit （BiGRU） was used to learn the characteristic information of the input text， and the attention mechanism was introduced to highlight the key feature vectors. Experimental results of text classification on public THUCNews dataset and Sina weibo sentiment classification dataset show that， the text classification models combining BERT word annotations have significantly improved performance compared to the text classification models without combining word annotations， the proposed BERT word annotation _BiGRU_Attention model has the highest precision and recall in all the experimental models for text classification， and has the F1-Score of reflecting the overall performance up to 98.16% and 96.52% respectively.

Key words: polysemy, word annotation, Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Gated Recurrent Unit (BiGRU), attention mechanism, text classification

中图分类号:

TP183

杨先凤, 赵家和, 李自强. 融合字注释的文本分类模型[J]. 计算机应用, 2022, 42(5): 1317-1323.

Xianfeng YANG, Jiahe ZHAO, Ziqiang LI. Text classification model combining word annotations[J]. Journal of Computer Applications, 2022, 42(5): 1317-1323.

图/表 9

参考文献 18

1	中国互联网信息中心.第47 次《中国互联网发展状况统计报告》［R/OL］.［2021-02-03］.. 10.1007/978-981-33-6930-6_2
	China Internet Network Information Center. The 47th China Statistical Report on Internet Development ［R/OL］. ［2021-02-03］. . 10.1007/978-981-33-6930-6_2
2	KIM Y. Convolutional neural networks for sentence classification ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2014： 1746-1751. 10.3115/v1/d14-1181
3	HASHIDA S， TAMURA K， SAKAI T. Classifying tweets using convolutional neural networks with multi-channel distributed representation ［J］. IAENG International Journal of Computer Science， 2019， 46（1）： 68-75.
4	LIU P F， QIU X P， HUANG X J. Recurrent neural network for text classification with multi-task learning ［C］// Proceedings of the 2016 25th International Joint Conference on Artificial Intelligence. California： IJCAI Organization， 2016： 2873-2879. 10.18653/v1/d16-1012
5	van PHAN T， NAKAGAWA M. Text/non-text classification in online handwritten documents with recurrent neural networks ［C］// Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition. Piscataway： IEEE， 2014： 23-28. 10.1109/icfhr.2014.12
6	NOWAK J， TASPINAR A， SCHERER R. LSTM recurrent neural networks for short text and sentiment classification ［C］// Proceedings of the 2017 International Conference on Artificial Intelligence and Soft Computing， LNCS 10246. Cham： Springer， 2017： 553-562.
7	CHO K， MERRIËNBOER B VAN， GU̇LÇEHRE Ç， et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2014： 1724-1734. 10.3115/v1/d14-1179
8	HINTON G E. Learning distributed representations of concepts ［M］// MORRIS R G M. Parallel Distributed Processing： Implications for Psychology and Neurobiology. Oxford： Clarendon Press， 1989： 46-61.
9	MIKOLOV T， CHEN K， CORRADO G， et al. Efficient estimation of word representations in vector space ［EB/OL］. ［2021-02-21］. . 10.3126/jiee.v3i1.34327
10	LE Q， MIKOLOV T. Distributed representations of sentences and documents ［C］// Proceedings of the 2014 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 1188-1196.
11	PETERS M E， NEUMANN M， IYYER M， et al. Deep contextualized word representations ［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg： ACL， 2018： 2227-2237. 10.18653/v1/n18-1202
12	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186. 10.18653/v1/n19-1423
13	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 2017 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010. 10.1016/s0262-4079(17)32358-8
14	SUTSKEVER I， VINYALS O， LE Q V. Sequence to sequence learning with neural networks ［C］// Proceedings of the 2014 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 3104-3112.
15	BILAL M， ISRAR H， SHAHID M， et al. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian， decision tree and KNN classification techniques ［J］. Journal of King Saud University — Computer and Information Sciences， 2016， 28（3）： 330-344. 10.1016/j.jksuci.2015.11.003
16	SUN A X， LIM E P， LIU Y. On strategies for imbalanced text classification using SVM： a comparative study ［J］. Decision Support Systems， 2009， 48（1）： 191-201. 10.1016/j.dss.2009.07.011
17	JIANG L X， LI C Q， WANG S S， et al. Deep feature weighting for naive Bayes and its application to text classification ［J］. Engineering Applications of Artificial Intelligence， 2016， 52：26-39. 10.1016/j.engappai.2016.02.002
18	HINTON G E， SRIVASTAVA N， KRIZHEVSKY A， et al. Improving neural networks by preventing co-adaptation of feature detectors ［EB/OL］. ［2021-02-21］. .

参数	值
参数	数据集1	数据集2
Embedding维度	300	300
BERT注释编码维度	768	768
BiGRU	256	64
Attention size	64	32
Dropout	0.2	0.2
激活函数	Softmax	Softmax
损失函数	categorical_crossentropy	categorical_crossentropy
优化器	Adam	Adam

参数	值
参数	数据集1	数据集2
Embedding维度	300	300
BERT注释编码维度	768	768
BiGRU	256	64
Attention size	64	32
Dropout	0.2	0.2
激活函数	Softmax	Softmax
损失函数	categorical_crossentropy	categorical_crossentropy
优化器	Adam	Adam

模型	精确率	召回率	F1分数
TextCNN	0.935 2	0.934 8	0.935 0
LSTM	0.912 0	0.910 6	0.910 4
BiGRU	0.947 0	0.946 1	0.946 6
Word2Vec_TextCNN	0.937 9	0.936 4	0.937 1
Word2Vec_LSTM	0.940 9	0.939 8	0.940 3
Word2Vec_BiGRU	0.948 9	0.948 1	0.948 5
BERT字注释_TextCNN	0.946 9	0.947 3	0.946 3
BERT字注释_LSTM	0.967 0	0.965 1	0.966 0
BERT字注释_BiGRU	0.971 0	0.971 2	0.971 1
BERT字注释_BiGRU_Attention	0.982 0	0.981 1	0.981 6

模型	精确率	召回率	F1分数
TextCNN	0.935 2	0.934 8	0.935 0
LSTM	0.912 0	0.910 6	0.910 4
BiGRU	0.947 0	0.946 1	0.946 6
Word2Vec_TextCNN	0.937 9	0.936 4	0.937 1
Word2Vec_LSTM	0.940 9	0.939 8	0.940 3
Word2Vec_BiGRU	0.948 9	0.948 1	0.948 5
BERT字注释_TextCNN	0.946 9	0.947 3	0.946 3
BERT字注释_LSTM	0.967 0	0.965 1	0.966 0
BERT字注释_BiGRU	0.971 0	0.971 2	0.971 1
BERT字注释_BiGRU_Attention	0.982 0	0.981 1	0.981 6

模型	精确率	召回率	F1分数
TextCNN	0.928 5	0.929 8	0.930 1
LSTM	0.939 0	0.938 3	0.938 6
BiGRU	0.947 0	0.946 5	0.954 8
BERT字注释_TextCNN	0.946 0	0.945 8	0.945 9
BERT字注释_LSTM	0.951 5	0.950 6	0.951 0
BERT字注释_BiGRU	0.957 0	0.956 2	0.956 6
BERT字注释_BiGRU_Attention	0.965 5	0.965 0	0.965 2

融合字注释的文本分类模型

Text classification model combining word annotations

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 18

相关文章 15

编辑推荐

Metrics

[1]	陈代丽, 许国良. 基于注意力机制学习域内变化的跨域行人重识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1391-1397.
[2]	庄屹, 赵海涛. 面向三维点云单目标跟踪的提案聚合网络[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1407-1416.
[3]	屈震, 李堃婷, 冯志玺. 基于有效通道注意力的遥感图像场景分类[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1431-1439.
[4]	张晔, 刘蓉, 刘明, 陈明. 基于多通道注意力机制的图像超分辨率重建网络[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1563-1569.
[5]	胡鹤轩, 隋华超, 胡强, 张晔, 胡震云, 马能武. 基于图注意力网络与双阶注意力机制的径流预报模型[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1607-1615.
[6]	任炜, 白鹤翔. 基于全局与局部标签关系的多标签图像分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1383-1390.
[7]	杨世刚, 刘勇国. 融合语料库特征与图注意力网络的短文本分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1324-1329.
[8]	张锦, 屈佩琪, 孙程, 罗蒙. 基于改进YOLOv5的安全帽佩戴检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1292-1300.
[9]	董永峰, 孙跃华, 高立超, 韩鹏, 季海鹏. 基于改进一维卷积和双向长短期记忆神经网络的故障诊断方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1207-1215.
[10]	刘志华, 陈文洁, 陈爱斌. 基于自注意力机制时频谱同源特征融合的鸟鸣声分类[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1260-1268.
[11]	顾军华, 王锐, 李宁宁, 张素琪. 融合协同过滤信息的知识图注意力网络[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1087-1092.
[12]	蒋雯静, 熊熙, 李中志, 李斌勇. 基于无采样协作知识图网络的推荐系统[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1057-1064.
[13]	唐望径, 许斌, 仝美涵, 韩美奂, 王黎明, 钟琦. 知识图谱增强的科普文本分类模型[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1072-1078.
[14]	胡新荣, 张君宇, 彭涛, 刘军平, 何儒汉, 何凯. 级联跨域特征融合的虚拟试衣[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1269-1274.
[15]	张海丰, 曾诚, 潘列, 郝儒松, 温超东, 何鹏. 结合BERT和特征投影网络的新闻主题文本分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1116-1124.