融合字注释的文本分类模型

doi:10.11772/j.issn.1001-9081.2021030489

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (5): 1317-1323.DOI: 10.11772/j.issn.1001-9081.2021030489

所属专题：人工智能

• 人工智能 • 下一篇

融合字注释的文本分类模型

杨先凤¹(), 赵家和¹, 李自强²

^1.西南石油大学计算机科学学院，成都 610500
^2.四川师范大学影视与传媒学院，成都 610066

收稿日期:2021-03-31 修回日期:2021-07-08 接受日期:2021-07-21 发布日期:2022-06-11 出版日期:2022-05-10
通讯作者: 杨先凤
作者简介:杨先凤（1974—），女，四川南部人，教授，硕士，主要研究方向：计算机图像处理、智慧教育565695835@qq.com
赵家和（1997—），男，陕西渭南人，硕士研究生，主要研究方向：自然语言处理
李自强（1970—），四川青神人，教授，博士，CCF会员，主要研究方向：机器学习、智慧教育。
基金资助:
国家自然科学基金资助项目(61802321);四川省科技厅重点研发计划项目(2020YFN0019)

Text classification model combining word annotations

Xianfeng YANG¹(), Jiahe ZHAO¹, Ziqiang LI²

^1.School of Computer Science，Southwest Petroleum University，Chengdu Sichuan 610500，China
^2.College of Movie and Media，Sichuan Normal University，Chengdu Sichuan 610066，China

Received:2021-03-31 Revised:2021-07-08 Accepted:2021-07-21 Online:2022-06-11 Published:2022-05-10
Contact: Xianfeng YANG
About author:YANG Xianfeng，born in 1974，M. S.，professor. Her researchinterests include computer image processing，wisdom education.
ZHAO Jiahe， born in 1997， M. S. candidate. His researchinterests include natural language processing.
LI Ziqiang，born in 1970，Ph. D.，professor. His research interestsinclude machine learning，wisdom education.
Supported by:
National Natural Science Foundation of China(61802321);Key Research and Development Program of Science and Technology Department of Sichuan Province(2020YFN0019)

摘要/Abstract

摘要：

针对传统文本特征表示方法无法充分解决一词多义的问题，构建了一种融合字注释的文本分类模型。首先，借助现有中文字典，获取文本由字上下文选取的字典注释，并对其进行Transformer的双向编码器（BERT）编码来生成注释句向量；然后，将注释句向量与字嵌入向量融合作为输入层，并用来丰富输入文本的特征信息；最后，通过双向门控循环单元（BiGRU）学习文本的特征信息，并引入注意力机制突出关键特征向量。在公开数据集THUCNews和新浪微博情感分类数据集上进行的文本分类的实验结果表明，融合BERT字注释的文本分类模型相较未引入字注释的文本分类模型在性能上有显著提高，且在所有文本分类的实验模型中，所提出的BERT字注释_BiGRU_Attention模型有最高的精确率和召回率，能反映整体性能的F1-Score则分别高达98.16%和96.52%。

关键词: 一词多义, 字注释, 基于Transformer的双向编码器, 双向门控循环单元, 注意力机制, 文本分类

Abstract:

The traditional text feature representation method cannot fully solve the polysemy problem of word. In order to solve the problem， a new text classification model combining word annotations was proposed. Firstly， by using the existing Chinese dictionary， the dictionary annotations of the text selected by the word context were obtained， and the Bidirectional Encoder Representations from Transformers （BERT） encoding was performed on them to generate the annotated sentence vectors. Then， the annotated sentence vectors were integrated with the word embedding vectors as the input layer to enrich the characteristic information of the input text. Finally， the Bidirectional Gated Recurrent Unit （BiGRU） was used to learn the characteristic information of the input text， and the attention mechanism was introduced to highlight the key feature vectors. Experimental results of text classification on public THUCNews dataset and Sina weibo sentiment classification dataset show that， the text classification models combining BERT word annotations have significantly improved performance compared to the text classification models without combining word annotations， the proposed BERT word annotation _BiGRU_Attention model has the highest precision and recall in all the experimental models for text classification， and has the F1-Score of reflecting the overall performance up to 98.16% and 96.52% respectively.

Key words: polysemy, word annotation, Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Gated Recurrent Unit (BiGRU), attention mechanism, text classification

中图分类号:

TP183

杨先凤, 赵家和, 李自强. 融合字注释的文本分类模型[J]. 计算机应用, 2022, 42(5): 1317-1323.

Xianfeng YANG, Jiahe ZHAO, Ziqiang LI. Text classification model combining word annotations[J]. Journal of Computer Applications, 2022, 42(5): 1317-1323.

图/表 9

参考文献 18

1	中国互联网信息中心.第47 次《中国互联网发展状况统计报告》［R/OL］.［2021-02-03］.. 10.1007/978-981-33-6930-6_2
	China Internet Network Information Center. The 47th China Statistical Report on Internet Development ［R/OL］. ［2021-02-03］. . 10.1007/978-981-33-6930-6_2
2	KIM Y. Convolutional neural networks for sentence classification ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2014： 1746-1751. 10.3115/v1/d14-1181
3	HASHIDA S， TAMURA K， SAKAI T. Classifying tweets using convolutional neural networks with multi-channel distributed representation ［J］. IAENG International Journal of Computer Science， 2019， 46（1）： 68-75.
4	LIU P F， QIU X P， HUANG X J. Recurrent neural network for text classification with multi-task learning ［C］// Proceedings of the 2016 25th International Joint Conference on Artificial Intelligence. California： IJCAI Organization， 2016： 2873-2879. 10.18653/v1/d16-1012
5	van PHAN T， NAKAGAWA M. Text/non-text classification in online handwritten documents with recurrent neural networks ［C］// Proceedings of the 2014 14th International Conference on Frontiers in Handwriting Recognition. Piscataway： IEEE， 2014： 23-28. 10.1109/icfhr.2014.12
6	NOWAK J， TASPINAR A， SCHERER R. LSTM recurrent neural networks for short text and sentiment classification ［C］// Proceedings of the 2017 International Conference on Artificial Intelligence and Soft Computing， LNCS 10246. Cham： Springer， 2017： 553-562.
7	CHO K， MERRIËNBOER B VAN， GU̇LÇEHRE Ç， et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation ［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg： ACL， 2014： 1724-1734. 10.3115/v1/d14-1179
8	HINTON G E. Learning distributed representations of concepts ［M］// MORRIS R G M. Parallel Distributed Processing： Implications for Psychology and Neurobiology. Oxford： Clarendon Press， 1989： 46-61.
9	MIKOLOV T， CHEN K， CORRADO G， et al. Efficient estimation of word representations in vector space ［EB/OL］. ［2021-02-21］. . 10.3126/jiee.v3i1.34327
10	LE Q， MIKOLOV T. Distributed representations of sentences and documents ［C］// Proceedings of the 2014 31st International Conference on Machine Learning. New York： JMLR.org， 2014： 1188-1196.
11	PETERS M E， NEUMANN M， IYYER M， et al. Deep contextualized word representations ［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）. Stroudsburg： ACL， 2018： 2227-2237. 10.18653/v1/n18-1202
12	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg： ACL， 2019： 4171-4186. 10.18653/v1/n19-1423
13	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 2017 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010. 10.1016/s0262-4079(17)32358-8
14	SUTSKEVER I， VINYALS O， LE Q V. Sequence to sequence learning with neural networks ［C］// Proceedings of the 2014 27th International Conference on Neural Information Processing Systems. Cambridge： MIT Press， 2014： 3104-3112.
15	BILAL M， ISRAR H， SHAHID M， et al. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian， decision tree and KNN classification techniques ［J］. Journal of King Saud University — Computer and Information Sciences， 2016， 28（3）： 330-344. 10.1016/j.jksuci.2015.11.003
16	SUN A X， LIM E P， LIU Y. On strategies for imbalanced text classification using SVM： a comparative study ［J］. Decision Support Systems， 2009， 48（1）： 191-201. 10.1016/j.dss.2009.07.011
17	JIANG L X， LI C Q， WANG S S， et al. Deep feature weighting for naive Bayes and its application to text classification ［J］. Engineering Applications of Artificial Intelligence， 2016， 52：26-39. 10.1016/j.engappai.2016.02.002
18	HINTON G E， SRIVASTAVA N， KRIZHEVSKY A， et al. Improving neural networks by preventing co-adaptation of feature detectors ［EB/OL］. ［2021-02-21］. .

参数	值
参数	数据集1	数据集2
Embedding维度	300	300
BERT注释编码维度	768	768
BiGRU	256	64
Attention size	64	32
Dropout	0.2	0.2
激活函数	Softmax	Softmax
损失函数	categorical_crossentropy	categorical_crossentropy
优化器	Adam	Adam

参数	值
参数	数据集1	数据集2
Embedding维度	300	300
BERT注释编码维度	768	768
BiGRU	256	64
Attention size	64	32
Dropout	0.2	0.2
激活函数	Softmax	Softmax
损失函数	categorical_crossentropy	categorical_crossentropy
优化器	Adam	Adam

模型	精确率	召回率	F1分数
TextCNN	0.935 2	0.934 8	0.935 0
LSTM	0.912 0	0.910 6	0.910 4
BiGRU	0.947 0	0.946 1	0.946 6
Word2Vec_TextCNN	0.937 9	0.936 4	0.937 1
Word2Vec_LSTM	0.940 9	0.939 8	0.940 3
Word2Vec_BiGRU	0.948 9	0.948 1	0.948 5
BERT字注释_TextCNN	0.946 9	0.947 3	0.946 3
BERT字注释_LSTM	0.967 0	0.965 1	0.966 0
BERT字注释_BiGRU	0.971 0	0.971 2	0.971 1
BERT字注释_BiGRU_Attention	0.982 0	0.981 1	0.981 6

模型	精确率	召回率	F1分数
TextCNN	0.935 2	0.934 8	0.935 0
LSTM	0.912 0	0.910 6	0.910 4
BiGRU	0.947 0	0.946 1	0.946 6
Word2Vec_TextCNN	0.937 9	0.936 4	0.937 1
Word2Vec_LSTM	0.940 9	0.939 8	0.940 3
Word2Vec_BiGRU	0.948 9	0.948 1	0.948 5
BERT字注释_TextCNN	0.946 9	0.947 3	0.946 3
BERT字注释_LSTM	0.967 0	0.965 1	0.966 0
BERT字注释_BiGRU	0.971 0	0.971 2	0.971 1
BERT字注释_BiGRU_Attention	0.982 0	0.981 1	0.981 6

模型	精确率	召回率	F1分数
TextCNN	0.928 5	0.929 8	0.930 1
LSTM	0.939 0	0.938 3	0.938 6
BiGRU	0.947 0	0.946 5	0.954 8
BERT字注释_TextCNN	0.946 0	0.945 8	0.945 9
BERT字注释_LSTM	0.951 5	0.950 6	0.951 0
BERT字注释_BiGRU	0.957 0	0.956 2	0.956 6
BERT字注释_BiGRU_Attention	0.965 5	0.965 0	0.965 2

融合字注释的文本分类模型

Text classification model combining word annotations

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 18

相关文章 15

编辑推荐

Metrics

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[3]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[4]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[5]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[6]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[7]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[8]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[9]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[10]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[11]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[12]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.
[13]	魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191.
[14]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[15]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.