融合BERT与标签语义注意力的文本多标签分类方法

doi:10.11772/j.issn.1001-9081.2021020366

《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (1): 57-63.DOI: 10.11772/j.issn.1001-9081.2021020366

所属专题：人工智能

融合BERT与标签语义注意力的文本多标签分类方法

吕学强, 彭郴, 张乐(), 董志安, 游新冬

网络文化与数字传播北京市重点实验室（北京信息科技大学），北京 100101

收稿日期:2021-03-11 修回日期:2021-04-28 接受日期:2021-04-29 发布日期:2021-05-21 出版日期:2022-01-10
通讯作者: 张乐
作者简介:吕学强（1970—），男，山东鱼台人，教授，博士，CCF会员，主要研究方向：中文与多媒体信息处理
彭郴（1996—），男，湖北黄石人，硕士研究生，主要研究方向：自然语言处理
张乐（1988—），女，河北石家庄人，副教授，博士，主要研究方向：自然语言处理、网络用户行为分析
董志安（1989—），男，辽宁抚顺人，研究员，硕士，主要研究方向：自然语言处理
游新冬（1979—），女，福建永定人，副教授，博士，CCF会员，主要研究方向：自然语言处理、文本挖掘、数据分类。
基金资助:
北京市自然科学基金资助项目(4212020);青海省藏文信息处理与机器翻译重点实验室/藏文信息处理教育部重点实验室开放课题基金资助项目(2019Z002)

Text multi-label classification method incorporating BERT and label semantic attention

Xueqiang LYU, Chen PENG, Le ZHANG(), Zhi’an DONG, Xindong YOU

Beijing Key Laboratory of Internet Culture and Digital Dissemination Research （Beijing Information Science and Technology University），Beijing 100101，China

Received:2021-03-11 Revised:2021-04-28 Accepted:2021-04-29 Online:2021-05-21 Published:2022-01-10
Contact: Le ZHANG
About author:LYU Xueqiang， born in 1970， Ph. D.， professor. His research interests include Chinese and multimedia information processing.
PENG Chen， born in 1996， M. S. candidate. His research interests include natural language processing.
ZHANG Le， born in 1988， Ph. D.， associate professor. Her research interests include natural language processing， Web user behavior analysis.
DONG Zhi’an， born in 1989， M. S.， research fellow. His research interests include natural language processing.
YOU Xindong， born in 1979， Ph. D.， associate professor. Her research interests include natural language processing， text mining， data classification.
Supported by:
Natural Science Foundation of Beijing(4212020);Tibetan Information Processing and Machine Translation Key Laboratory of Qinghai Province/ Open Project Fund of Key Laboratory of Tibetan Information Processing of Ministry of Education(2019Z002)

摘要/Abstract

摘要：

多标签文本分类（MLTC）是自然语言处理（NLP）领域的重要子课题之一。针对多个标签之间存在复杂关联性的问题，提出了一种融合BERT与标签语义注意力的MLTC方法TLA-BERT。首先，通过对自编码预训练模型进行微调，从而学习输入文本的上下文向量表示；然后，使用长短期记忆（LSTM）神经网络将标签进行单独编码；最后，利用注意力机制显性突出文本对每个标签的贡献，以预测多标签序列。实验结果表明，与基于序列生成模型（SGM）算法相比，所提出的方法在AAPD与RCV1-v2公开数据集上，F1值分别提高了2.8个百分点与1.5个百分点。

关键词: 多标签分类, BERT, 标签语义信息, 双向长短期记忆神经网络, 注意力机制

Abstract:

Multi-Label Text Classification （MLTC） is one of the important subtasks in the field of Natural Language Processing （NLP）. In order to solve the problem of complex correlation between multiple labels， an MLTC method TLA-BERT was proposed by incorporating Bidirectional Encoder Representations from Transformers （BERT） and label semantic attention. Firstly， the contextual vector representation of the input text was learned by fine-tuning the self-coding pre-training model. Secondly， the labels were encoded individually by using Long Short-Term Memory （LSTM） neural network. Finally， the contribution of text to each label was explicitly highlighted with the use of an attention mechanism in order to predict the multi-label sequences. Experimental results show that compared with Sequence Generation Model （SGM） algorithm， the proposed method improves the F value by 2.8 percentage points and 1.5 percentage points on the Arxiv Academic Paper Dataset （AAPD） and Reuters Corpus Volume I （RCV1）-v2 public dataset respectively.

Key words: multi-label classification, Bidirectional Encoder Representations from Transformers (BERT), label semantic information, Bidirectional Long Short-Term Memory (BiLSTM) neural network, attention mechanism

中图分类号:

TP391.1

吕学强, 彭郴, 张乐, 董志安, 游新冬. 融合BERT与标签语义注意力的文本多标签分类方法[J]. 计算机应用, 2022, 42(1): 57-63.

Xueqiang LYU, Chen PENG, Le ZHANG, Zhi’an DONG, Xindong YOU. Text multi-label classification method incorporating BERT and label semantic attention[J]. Journal of Computer Applications, 2022, 42(1): 57-63.

图/表 5

参考文献 35

1	ROBERT E S， YORAM S. BoosTexter： a boosting-based system for text categorization［J］. Machine Learning， 2000， 39（2/3）：135-168.
2	SIDDHARTH G， YIMING Y M. Multilabel classification with meta-level features［C］// Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2010：315-322. 10.1145/1835449.1835503
3	KATAKIS I， TSOUMAKAS G， VLAHAVAS I. Multilabel text classification for automated tag suggestion［C/OL］// Proceedings of the 2008 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ［2020-12-13］..
4	MIKOLOV T， CHEN K， CORRADO G， et al.Efficient estimation of word representations in vector space ［EB/OL］.（2013-09-07）［2021-02-05］.. 10.3126/jiee.v3i1.34327
5	PENNINGTON J， SOCHER R， MANNING C D. GloVe： global vectors for word representation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing， Stroudsburg， PA： Association for Computational Linguistics， 2014：1532-1543. 10.3115/v1/d14-1162
6	HOCHREITER S， SCHMIDHUBER J. Long short-term memory［J］. Neural Computation， 1997， 9（8）：1735-1780. 10.1162/neco.1997.9.8.1735
7	PETERS M E， NEUMANN M， IYYER M， et al. Deep contextualized word representations［C］// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long Papers）， Stroudsburg， PA： Association for Computational Linguistics， 2018：2227-2237. 10.18653/v1/n18-1202
8	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017：6000-6010. 10.1016/s0262-4079(17)32358-8
9	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2019：4171-4186. 10.18653/v1/n19-1423
10	YANG W， XIE Y Q， LIN A， et al. End-to-end open-domain question answering with BERTserini［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics （Demonstrations）. Stroudsburg， PA： Association for Computational Linguistics， 2019：72-77. 10.18653/v1/n19-4013
11	SUN C， QIU X P， XU Y G， et al. How to fine-tune BERT for text classification？［C］// Proceedings of the 18th China National Conference on Chinese Computational Linguistics， LNCS11856. Cham： Springer， 2019：194-206.
12	XU H， LIU B， SHU L， et al. BERT post-training for review reading comprehension and aspect-based sentiment analysis［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. Stroudsburg， PA： Association for Computational Linguistics， 2019：2324-2335.
13	CHEN Z Y， TRABELSI M， HEFLIN J， et al. Table search using a deep contextualized language model［C］// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York： ACM， 2020：589-598. 10.1145/3397271.3401044
14	YANG P C， SUN X， LI W， et al. SGM： sequence generation model for multi-label classification［C］// Proceedings of the 27th Conference on International Conference on Computational Linguistics. Stroudsburg， PA： Association for Computational Linguistics， 2018：3915-3926. 10.18653/v1/p19-1518
15	LEWIS D D， YANG Y M， ROSE T G， et al. RCV1： a new benchmark collection for text categorization research［J］. Journal of Machine Learning Research， 2004， 5：361-397.
16	YEN I E H， HUANG X R， DAI W， et al. PPDsparse： a parallel primal-dual sparse method for extreme classification［C］// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2017：545-553. 10.1145/3097983.3098083
17	杨菊英，刘燚，罗佳. 基于划分子集主题模型的多标签极限分类［J］.计算机工程与设计， 2020， 41（12）：3432-3437. 10.16208/j.issn1000-7024.2020.12.020
	YANG J Y， LIU Y， LUO J. Multi-label extreme classification based on subset topic model［J］. Computer Engineering and Design， 2020， 41（12）：3432-3437. 10.16208/j.issn1000-7024.2020.12.020
18	JAIN H， BALASUBRAMANIAN V， CHUNDURI B， et al. Slice： scalable linear extreme classifiers trained on 100 million labels for related searches［C］// Proceedings of the 12th ACM International Conference on Web Search and Data Mining. New York： ACM， 2019：528-536. 10.1145/3289600.3290979
19	姚佳奇，徐正国，燕继坤，等. 基于标签语义相似的动态多标签文本分类算法［J］. 计算机工程与应用， 2020， 56（19）：94-98.
	YAO J Q， XU Z G， YAN J K， et al. Dynamic multi-label text classification algorithm based on label semantic similarity［J］. Computer Engineering and Applications， 2020， 56（19）：94-98.
20	檀何凤，刘政怡. 基于标签相关性的K近邻多标签分类方法［J］. 计算机应用， 2015， 35（10）：2761-2765. 10.11772/j.issn.1001-9081.2015.10.2761
	TAN H F， LIU Z Y. Multi-label K nearest neighbor algorithm by exploiting label correlation［J］. Journal of Computer Applications， 2015， 35（10）：2761-2765. 10.11772/j.issn.1001-9081.2015.10.2761
21	PRABHU Y， VARMA M. FastXML： a fast， accurate and stable tree-classifier for extreme multi-label learning［C］// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2014：263-272. 10.1145/2623330.2623651
22	YOU R H， ZHANG Z H， WANG Z Y， et al. AttentionXML： label tree-based attention-aware deep model for high-performance extreme multi-label text classification［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2020-12-14］..
23	JAIN H， PRABHU Y， VARMA M. Extreme multi-label loss functions for recommendation， tagging， ranking & other missing label applications［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016：935-944. 10.1145/2939672.2939756
24	肖琳，陈博理，黄鑫，等. 基于标签语义注意力的多标签文本分类［J］.软件学报， 2020， 31（4）：1079-1089. 10.13328/j.cnki.jos.005923
	XIAO L， CHEN B L， HUANG X， et al. Multi-label text classification method based on label semantic information［J］. Journal of Software， 2020， 31（4）：1079-1089. 10.13328/j.cnki.jos.005923
25	王敏蕊，高曙，袁自勇，等. 基于动态路由序列生成模型的多标签文本分类方法［J］. 计算机应用， 2020， 40（7）：1884-1890. 10.11772/j.issn.1001-9081.2019112027
	WANG M R， GAO S， YUAN Z Y， et al. Sequence generation model with dynamic routing for multi-label text classification［J］. Journal of Computer Applications， 2020， 40（7）：1884-1890. 10.11772/j.issn.1001-9081.2019112027
26	YANG Z， DAI Z， YANG Y， et al. XLNet： generalized autoregressive pretraining for language understanding［C/OL］// Proceedings of the 33rd Conference on Neural Information Processing Systems. ［2020-12-14］.. 10.1145/3369985.3370025
27	MANNING C D， RAGHAVAN P， SCHÜTZE H. Introduction to information retrieval［M］. Cambridge： Cambridge University Press， 2008：334-336. 10.1017/cbo9780511809071
28	BOUTELL M R， LUO J B， SHEN X P， et al. Learning multi-label scene classification［J］. Pattern Recognition， 2004， 37（9）：1757-1771. 10.1016/j.patcog.2004.03.009
29	READ J， PFAHRINGER B， HOLMES G， et al. Classiﬁer chains for multi-label classiﬁcation［J］. Machine Learning， 2011， 85（3）： No.333. 10.1007/s10994-011-5256-5
30	TSOUMAKAS G， KATAKIS I. Multi-label classification： an overview［J］. International Journal of Data Warehousing and Mining， 2006， 3（3）：1-13. 10.4018/jdwm.2007070101
31	KIM Y. Convolutional neural networks for sentence classification［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2014：1746-1751. 10.3115/v1/d14-1181
32	CHEN G B， YE D H， XING Z C， et al. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization［C］// Proceedings of the 2017 International Joint Conference on Neural Networks. Piscataway： IEEE， 2017：2377-2383. 10.1109/ijcnn.2017.7966144
33	PAL A， SELVAKUMAR M， SANKARASUBBU M. MAGNET： multi-label text classification using attention-based graph neural network［C］// Proceedings of the 12th International Conference on Agents and Artificial Intelligence， Volume2： ICAART. Setúbal： SciTePress， 2020：494-505.
34	KINGMA D P， BA J L. Adam： a method for stochastic optimization［EB/OL］. （2017-01-30）［2020-12-03］..
35	SRIVASTAVA N， HINTON GE， KRIZHEVSKY A， et al. Dropout： a simple way to prevent neural networks from overfitting［J］. Journal of Machine Learning Research， 2014， 15：1929-1958.

数据集	数据总数	训练集样本数	验证集样本数	测试集样本数	标签总数	平均每条数据词数	平均每条数据标签数
RCV1-v2	804 414	563 090	120 662	120 662	103	123.94	3.24
AAPD	55 840	39 088	8 376	8 376	54	163.42	2.41

数据集	数据总数	训练集样本数	验证集样本数	测试集样本数	标签总数	平均每条数据词数	平均每条数据标签数
RCV1-v2	804 414	563 090	120 662	120 662	103	123.94	3.24
AAPD	55 840	39 088	8 376	8 376	54	163.42	2.41

模型方法	AAPD			RCV1-v2
模型方法	正确率	召回率	F1值	正确率	召回率	F1值
BR	0.644	0.648	0.646	0.904	0.816	0.858
CC	0.657	0.651	0.654	0.887	0.828	0.857
LP	0.662	0.608	0.634	0.896	0.824	0.858
CNN	0.849	0.545	0.664	0.922	0.798	0.855
CNN-RNN	0.718	0.618	0.664	0.889	0.825	0.856
SGM	0.746	0.659	0.699	0.887	0.850	0.869
SGM+GE	0.748	0.675	0.710	0.897	0.860	0.878
MAGNET	—	—	0.696	0.885	—	—
TLA-BERT	0.752	0.703	0.727	0.906	0.864	0.884

模型方法	AAPD			RCV1-v2
模型方法	正确率	召回率	F1值	正确率	召回率	F1值
BR	0.644	0.648	0.646	0.904	0.816	0.858
CC	0.657	0.651	0.654	0.887	0.828	0.857
LP	0.662	0.608	0.634	0.896	0.824	0.858
CNN	0.849	0.545	0.664	0.922	0.798	0.855
CNN-RNN	0.718	0.618	0.664	0.889	0.825	0.856
SGM	0.746	0.659	0.699	0.887	0.850	0.869
SGM+GE	0.748	0.675	0.710	0.897	0.860	0.878
MAGNET	—	—	0.696	0.885	—	—
TLA-BERT	0.752	0.703	0.727	0.906	0.864	0.884

模型方法	AAPD			RCV1-v2
模型方法	正确率	召回率	F1值	正确率	召回率	F1值
BERT-noTLA	0.823	0.594	0.690	0.904	0.843	0.875
TLA-BERT	0.752	0.703	0.727	0.906	0.864	0.884

融合BERT与标签语义注意力的文本多标签分类方法

Text multi-label classification method incorporating BERT and label semantic attention

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 35

相关文章 15

编辑推荐

Metrics

[1]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[2]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[3]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[4]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[5]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[6]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[7]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[8]	莫尚斌, 王文君, 董凌, 高盛祥, 余正涛. 基于多路信息聚合协同解码的单通道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2611-2617.
[9]	熊武, 曹从军, 宋雪芳, 邵云龙, 王旭升. 基于多尺度混合域注意力机制的笔迹鉴别方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2225-2232.
[10]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[11]	毛典辉, 李学博, 刘峻岭, 张登辉, 颜文婧. 基于并行异构图和序列注意力机制的中文实体关系抽取模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2018-2025.
[12]	刘丽, 侯海金, 王安红, 张涛. 基于多尺度注意力的生成式信息隐藏算法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2102-2109.
[13]	徐松, 张文博, 王一帆. 基于时空信息的轻量视频显著性目标检测网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2192-2199.
[14]	李大海, 王忠华, 王振东. 结合空间域和频域信息的双分支低光照图像增强网络[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2175-2182.
[15]	魏文亮, 王阳萍, 岳彪, 王安政, 张哲. 基于光照权重分配和注意力的红外与可见光图像融合深度学习模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2183-2191.