《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (1): 57-63.DOI: 10.11772/j.issn.1001-9081.2021020366
收稿日期:
2021-03-11
修回日期:
2021-04-28
接受日期:
2021-04-29
发布日期:
2021-05-21
出版日期:
2022-01-10
通讯作者:
张乐
作者简介:
吕学强(1970—),男,山东鱼台人,教授,博士,CCF会员,主要研究方向:中文与多媒体信息处理基金资助:
Xueqiang LYU, Chen PENG, Le ZHANG(), Zhi’an DONG, Xindong YOU
Received:
2021-03-11
Revised:
2021-04-28
Accepted:
2021-04-29
Online:
2021-05-21
Published:
2022-01-10
Contact:
Le ZHANG
About author:
LYU Xueqiang, born in 1970, Ph. D., professor. His research interests include Chinese and multimedia information processing.Supported by:
摘要:
多标签文本分类(MLTC)是自然语言处理(NLP)领域的重要子课题之一。针对多个标签之间存在复杂关联性的问题,提出了一种融合BERT与标签语义注意力的MLTC方法TLA-BERT。首先,通过对自编码预训练模型进行微调,从而学习输入文本的上下文向量表示;然后,使用长短期记忆(LSTM)神经网络将标签进行单独编码;最后,利用注意力机制显性突出文本对每个标签的贡献,以预测多标签序列。实验结果表明,与基于序列生成模型(SGM)算法相比,所提出的方法在AAPD与RCV1-v2公开数据集上,F1值分别提高了2.8个百分点与1.5个百分点。
中图分类号:
吕学强, 彭郴, 张乐, 董志安, 游新冬. 融合BERT与标签语义注意力的文本多标签分类方法[J]. 计算机应用, 2022, 42(1): 57-63.
Xueqiang LYU, Chen PENG, Le ZHANG, Zhi’an DONG, Xindong YOU. Text multi-label classification method incorporating BERT and label semantic attention[J]. Journal of Computer Applications, 2022, 42(1): 57-63.
数据集 | 数据总数 | 训练集样本数 | 验证集样本数 | 测试集样本数 | 标签总数 | 平均每条数据词数 | 平均每条数据标签数 |
---|---|---|---|---|---|---|---|
RCV1-v2 | 804 414 | 563 090 | 120 662 | 120 662 | 103 | 123.94 | 3.24 |
AAPD | 55 840 | 39 088 | 8 376 | 8 376 | 54 | 163.42 | 2.41 |
表1 数据集简介
Tab. 1 Dataset description
数据集 | 数据总数 | 训练集样本数 | 验证集样本数 | 测试集样本数 | 标签总数 | 平均每条数据词数 | 平均每条数据标签数 |
---|---|---|---|---|---|---|---|
RCV1-v2 | 804 414 | 563 090 | 120 662 | 120 662 | 103 | 123.94 | 3.24 |
AAPD | 55 840 | 39 088 | 8 376 | 8 376 | 54 | 163.42 | 2.41 |
模型方法 | AAPD | RCV1-v2 | ||||
---|---|---|---|---|---|---|
正确率 | 召回率 | F1值 | 正确率 | 召回率 | F1值 | |
BR | 0.644 | 0.648 | 0.646 | 0.904 | 0.816 | 0.858 |
CC | 0.657 | 0.651 | 0.654 | 0.887 | 0.828 | 0.857 |
LP | 0.662 | 0.608 | 0.634 | 0.896 | 0.824 | 0.858 |
CNN | 0.849 | 0.545 | 0.664 | 0.922 | 0.798 | 0.855 |
CNN-RNN | 0.718 | 0.618 | 0.664 | 0.889 | 0.825 | 0.856 |
SGM | 0.746 | 0.659 | 0.699 | 0.887 | 0.850 | 0.869 |
SGM+GE | 0.748 | 0.675 | 0.710 | 0.897 | 0.860 | 0.878 |
MAGNET | — | — | 0.696 | 0.885 | — | — |
TLA-BERT | 0.752 | 0.703 | 0.727 | 0.906 | 0.864 | 0.884 |
表2 AAPD与RCV1-v2数据集上的对比实验结果
Tab. 2 Experimental results of comparison on AAPD and RCV1-v2 datasets
模型方法 | AAPD | RCV1-v2 | ||||
---|---|---|---|---|---|---|
正确率 | 召回率 | F1值 | 正确率 | 召回率 | F1值 | |
BR | 0.644 | 0.648 | 0.646 | 0.904 | 0.816 | 0.858 |
CC | 0.657 | 0.651 | 0.654 | 0.887 | 0.828 | 0.857 |
LP | 0.662 | 0.608 | 0.634 | 0.896 | 0.824 | 0.858 |
CNN | 0.849 | 0.545 | 0.664 | 0.922 | 0.798 | 0.855 |
CNN-RNN | 0.718 | 0.618 | 0.664 | 0.889 | 0.825 | 0.856 |
SGM | 0.746 | 0.659 | 0.699 | 0.887 | 0.850 | 0.869 |
SGM+GE | 0.748 | 0.675 | 0.710 | 0.897 | 0.860 | 0.878 |
MAGNET | — | — | 0.696 | 0.885 | — | — |
TLA-BERT | 0.752 | 0.703 | 0.727 | 0.906 | 0.864 | 0.884 |
模型方法 | AAPD | RCV1-v2 | ||||
---|---|---|---|---|---|---|
正确率 | 召回率 | F1值 | 正确率 | 召回率 | F1值 | |
BERT-noTLA | 0.823 | 0.594 | 0.690 | 0.904 | 0.843 | 0.875 |
TLA-BERT | 0.752 | 0.703 | 0.727 | 0.906 | 0.864 | 0.884 |
表3 AAPD与RCV1-v2数据集上的消融实验结果
Tab. 3 Experimental results of ablation on AAPD and RCV1-v2 datasets
模型方法 | AAPD | RCV1-v2 | ||||
---|---|---|---|---|---|---|
正确率 | 召回率 | F1值 | 正确率 | 召回率 | F1值 | |
BERT-noTLA | 0.823 | 0.594 | 0.690 | 0.904 | 0.843 | 0.875 |
TLA-BERT | 0.752 | 0.703 | 0.727 | 0.906 | 0.864 | 0.884 |
1 | ROBERT E S, YORAM S. BoosTexter: a boosting-based system for text categorization[J]. Machine Learning, 2000, 39(2/3):135-168. |
2 | SIDDHARTH G, YIMING Y M. Multilabel classification with meta-level features[C]// Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2010:315-322. 10.1145/1835449.1835503 |
3 | KATAKIS I, TSOUMAKAS G, VLAHAVAS I. Multilabel text classification for automated tag suggestion[C/OL]// Proceedings of the 2008 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. [2020-12-13].. |
4 | MIKOLOV T, CHEN K, CORRADO G, et al.Efficient estimation of word representations in vector space [EB/OL].(2013-09-07)[2021-02-05].. 10.3126/jiee.v3i1.34327 |
5 | PENNINGTON J, SOCHER R, MANNING C D. GloVe: global vectors for word representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA: Association for Computational Linguistics, 2014:1532-1543. 10.3115/v1/d14-1162 |
6 | HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. 10.1162/neco.1997.9.8.1735 |
7 | PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Stroudsburg, PA: Association for Computational Linguistics, 2018:2227-2237. 10.18653/v1/n18-1202 |
8 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017:6000-6010. 10.1016/s0262-4079(17)32358-8 |
9 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019:4171-4186. 10.18653/v1/n19-1423 |
10 | YANG W, XIE Y Q, LIN A, et al. End-to-end open-domain question answering with BERTserini[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). Stroudsburg, PA: Association for Computational Linguistics, 2019:72-77. 10.18653/v1/n19-4013 |
11 | SUN C, QIU X P, XU Y G, et al. How to fine-tune BERT for text classification?[C]// Proceedings of the 18th China National Conference on Chinese Computational Linguistics, LNCS11856. Cham: Springer, 2019:194-206. |
12 | XU H, LIU B, SHU L, et al. BERT post-training for review reading comprehension and aspect-based sentiment analysis[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg, PA: Association for Computational Linguistics, 2019:2324-2335. |
13 | CHEN Z Y, TRABELSI M, HEFLIN J, et al. Table search using a deep contextualized language model[C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2020:589-598. 10.1145/3397271.3401044 |
14 | YANG P C, SUN X, LI W, et al. SGM: sequence generation model for multi-label classification[C]// Proceedings of the 27th Conference on International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2018:3915-3926. 10.18653/v1/p19-1518 |
15 | LEWIS D D, YANG Y M, ROSE T G, et al. RCV1: a new benchmark collection for text categorization research[J]. Journal of Machine Learning Research, 2004, 5:361-397. |
16 | YEN I E H, HUANG X R, DAI W, et al. PPDsparse: a parallel primal-dual sparse method for extreme classification[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2017:545-553. 10.1145/3097983.3098083 |
17 | 杨菊英,刘燚,罗佳. 基于划分子集主题模型的多标签极限分类[J].计算机工程与设计, 2020, 41(12):3432-3437. 10.16208/j.issn1000-7024.2020.12.020 |
YANG J Y, LIU Y, LUO J. Multi-label extreme classification based on subset topic model[J]. Computer Engineering and Design, 2020, 41(12):3432-3437. 10.16208/j.issn1000-7024.2020.12.020 | |
18 | JAIN H, BALASUBRAMANIAN V, CHUNDURI B, et al. Slice: scalable linear extreme classifiers trained on 100 million labels for related searches[C]// Proceedings of the 12th ACM International Conference on Web Search and Data Mining. New York: ACM, 2019:528-536. 10.1145/3289600.3290979 |
19 | 姚佳奇,徐正国,燕继坤,等. 基于标签语义相似的动态多标签文本分类算法[J]. 计算机工程与应用, 2020, 56(19):94-98. |
YAO J Q, XU Z G, YAN J K, et al. Dynamic multi-label text classification algorithm based on label semantic similarity[J]. Computer Engineering and Applications, 2020, 56(19):94-98. | |
20 | 檀何凤,刘政怡. 基于标签相关性的K近邻多标签分类方法[J]. 计算机应用, 2015, 35(10):2761-2765. 10.11772/j.issn.1001-9081.2015.10.2761 |
TAN H F, LIU Z Y. Multi-label K nearest neighbor algorithm by exploiting label correlation[J]. Journal of Computer Applications, 2015, 35(10):2761-2765. 10.11772/j.issn.1001-9081.2015.10.2761 | |
21 | PRABHU Y, VARMA M. FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014:263-272. 10.1145/2623330.2623651 |
22 | YOU R H, ZHANG Z H, WANG Z Y, et al. AttentionXML: label tree-based attention-aware deep model for high-performance extreme multi-label text classification[C/OL]// Proceedings of the 33rd Conference on Neural Information Processing Systems. [2020-12-14].. |
23 | JAIN H, PRABHU Y, VARMA M. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016:935-944. 10.1145/2939672.2939756 |
24 | 肖琳,陈博理,黄鑫,等. 基于标签语义注意力的多标签文本分类[J].软件学报, 2020, 31(4):1079-1089. 10.13328/j.cnki.jos.005923 |
XIAO L, CHEN B L, HUANG X, et al. Multi-label text classification method based on label semantic information[J]. Journal of Software, 2020, 31(4):1079-1089. 10.13328/j.cnki.jos.005923 | |
25 | 王敏蕊,高曙,袁自勇,等. 基于动态路由序列生成模型的多标签文本分类方法[J]. 计算机应用, 2020, 40(7):1884-1890. 10.11772/j.issn.1001-9081.2019112027 |
WANG M R, GAO S, YUAN Z Y, et al. Sequence generation model with dynamic routing for multi-label text classification[J]. Journal of Computer Applications, 2020, 40(7):1884-1890. 10.11772/j.issn.1001-9081.2019112027 | |
26 | YANG Z, DAI Z, YANG Y, et al. XLNet: generalized autoregressive pretraining for language understanding[C/OL]// Proceedings of the 33rd Conference on Neural Information Processing Systems. [2020-12-14].. 10.1145/3369985.3370025 |
27 | MANNING C D, RAGHAVAN P, SCHÜTZE H. Introduction to information retrieval[M]. Cambridge: Cambridge University Press, 2008:334-336. 10.1017/cbo9780511809071 |
28 | BOUTELL M R, LUO J B, SHEN X P, et al. Learning multi-label scene classification[J]. Pattern Recognition, 2004, 37(9):1757-1771. 10.1016/j.patcog.2004.03.009 |
29 | READ J, PFAHRINGER B, HOLMES G, et al. Classifier chains for multi-label classification[J]. Machine Learning, 2011, 85(3): No.333. 10.1007/s10994-011-5256-5 |
30 | TSOUMAKAS G, KATAKIS I. Multi-label classification: an overview[J]. International Journal of Data Warehousing and Mining, 2006, 3(3):1-13. 10.4018/jdwm.2007070101 |
31 | KIM Y. Convolutional neural networks for sentence classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA: Association for Computational Linguistics, 2014:1746-1751. 10.3115/v1/d14-1181 |
32 | CHEN G B, YE D H, XING Z C, et al. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization[C]// Proceedings of the 2017 International Joint Conference on Neural Networks. Piscataway: IEEE, 2017:2377-2383. 10.1109/ijcnn.2017.7966144 |
33 | PAL A, SELVAKUMAR M, SANKARASUBBU M. MAGNET: multi-label text classification using attention-based graph neural network[C]// Proceedings of the 12th International Conference on Agents and Artificial Intelligence, Volume2: ICAART. Setúbal: SciTePress, 2020:494-505. |
34 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30) [2020-12-03].. |
35 | SRIVASTAVA N, HINTON GE, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15:1929-1958. |
[1] | 王润泽, 张月琴, 秦琪琦, 张泽华, 郭旭敏. 多视角多注意力融合分子特征的药物-靶标亲和力预测[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 325-332. |
[2] | 杨贞, 彭小宝, 朱强强, 殷志坚. 基于Deeplab V3 Plus的自适应注意力机制图像分割算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 230-238. |
[3] | 代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551. |
[4] | 刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522. |
[5] | 李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509. |
[6] | 赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503. |
[7] | 党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351. |
[8] | 武维, 李泽平, 杨华蔚, 林川, 王忠德. 融合内容特征和时序信息的深度注意力视频流行度预测模型[J]. 计算机应用, 2021, 41(7): 1878-1884. |
[9] | 高钦泉, 黄炳城, 刘文哲, 童同. 基于改进CenterNet的竹条表面缺陷检测方法[J]. 计算机应用, 2021, 41(7): 1933-1938. |
[10] | 李朝, 兰海, 魏宪. 基于注意力的毫米波-激光雷达融合目标检测[J]. 计算机应用, 2021, 41(7): 2137-2144. |
[11] | 李扬志, 袁家政, 刘宏哲. 基于时空注意力图卷积网络模型的人体骨架动作识别算法[J]. 计算机应用, 2021, 41(7): 1915-1921. |
[12] | 张洋, 江铭虎. 基于注意力机制的文本作者识别[J]. 计算机应用, 2021, 41(7): 1897-1901. |
[13] | 李想, 王卫兵, 尚学达. 指针生成网络和覆盖损失优化的Transformer在生成式文本摘要领域的应用[J]. 计算机应用, 2021, 41(6): 1647-1651. |
[14] | 刘世泽, 朱奕达, 陈润泽, 罗海勇, 赵方, 孙艺, 王宝会. 基于残差时域注意力神经网络的交通模式识别算法[J]. 计算机应用, 2021, 41(6): 1557-1565. |
[15] | 沈雪雯, 王晓东, 姚宇. 基于空间分频的超声图像分割注意力网络[J]. 计算机应用, 2021, 41(6): 1828-1835. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||