基于动态路由序列生成模型的多标签文本分类方法

doi:10.11772/j.issn.1001-9081.2019112027

计算机应用 ›› 2020, Vol. 40 ›› Issue (7): 1884-1890.DOI: 10.11772/j.issn.1001-9081.2019112027

基于动态路由序列生成模型的多标签文本分类方法

王敏蕊, 高曙, 袁自勇, 袁蕾

武汉理工大学计算机科学与技术学院, 武汉 430063

收稿日期:2019-11-28 修回日期:2020-02-10 发布日期:2020-06-29 出版日期:2020-07-10
通讯作者: 高曙
作者简介:王敏蕊(1995-),女,江西南昌人,硕士研究生,主要研究方向:自然语言处理;高曙(1967-),女,湖北武汉人,教授,博士,主要研究方向:智能计算与语义识别、数据分析与应用;袁自勇(1995-),男,安徽亳州人,硕士研究生,主要研究方向:自然语言处理;袁蕾(1997-),女,安徽滁州人,硕士研究生,主要研究方向:自然语言处理。
基金资助:
国家自然科学基金资助项目（51679180）。

Sequence generation model with dynamic routing for multi-label text classification

WANG Minrui, GAO Shu, YUAN Ziyong, YUAN Lei

School of Computer Science and Technology, Wuhan University of Technology, Wuhan Hubei 430063, China

Received:2019-11-28 Revised:2020-02-10 Online:2020-06-29 Published:2020-07-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (51679180).

摘要/Abstract

摘要： 现实世界中，多标签文本比单标签文本具有更广泛的应用场景，但其输出空间的庞大给分类任务带来了更多的挑战。将多标签文本分类问题看作标签序列生成问题，把序列生成模型（SGM）应用于多标签文本分类领域，并针对该模型的顺序结构容易产生累积误差等不足，构建了基于动态路由（DR）的序列生成模型（DR-SGM）。该模型基于Encoder-Decoder模式：Encoder层中使用双向长短期记忆（Bi-LSTM）神经网络+Attention进行语义信息编码；Decoder层设计了一种基于动态路由的解码器结构，该结构在隐含层后添加了动态路由聚合层，利用路由参数的全局共享减弱了累积误差产生的影响。同时，动态路由能捕获文本中部分-部分、部分-整体的位置信息，并且通过优化动态路由算法进一步提高了语义聚合效果。将DR-SGM应用于多标签文本分类，实验结果表明，在RCV1-V2、AAPD和Slashdot数据集上，多标签文本分类效果得到了有效的提升。

关键词: 多标签文本分类, 序列生成模型, 胶囊网络, 动态路由, 双向长短期记忆神经网络

Abstract: In the real world, multi-label text has a wider application scenario than single-label text. At the same time, due to its huge output space, it brings a lot of challenges to the classification task. The multi-label text classification problem was regarded as label sequence generation problem, and the Sequence Generation Model (SGM) was applied to the multi-label text classification field. Aiming at the problems such as that the sequence structure of the model is easy to produce the cumulative error, an SGM based on Dynamic Routing (DR-SGM) was proposed. The model was based on Encoder-Decoder mode. In the Encoder layer, Bi-directional Long Short-Term Memory (Bi-LSTM) neural network+Attention was used to encode the semantic information. In the Decoder layer, a decoder structure with the dynamic routing aggregation layer was designed which reduces the influence of the cumulative error added behind the hidden layer. At the same time, the part-part and part-glob position information in the text was captured by dynamic routing. And by optimizing the dynamic routing algorithm, the semantic clustering effect was further improved. DR-SGM was applied to the classification of multi-label texts. The experimental results show that DR-SGM improves multi-label text classification results on the RCV1-V2, AAPD and Slashdot datasets.

Key words: multi-label text classification, Sequence Generation Model (SGM), capsule network, Dynamic Routing (DR), Bi-directional Long Short-Term Memory (Bi-LSTM) neural network

中图分类号:

TP391

王敏蕊, 高曙, 袁自勇, 袁蕾. 基于动态路由序列生成模型的多标签文本分类方法[J]. 计算机应用, 2020, 40(7): 1884-1890.

WANG Minrui, GAO Shu, YUAN Ziyong, YUAN Lei. Sequence generation model with dynamic routing for multi-label text classification[J]. Journal of Computer Applications, 2020, 40(7): 1884-1890.

参考文献

[1] JOHNSON R,ZHANG T. Deep pyramid convolutional neural networks for text categorization[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2017:562-570.
[2] WANG B. Disconnected recurrent neural networks for text categorization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics,2018:2311-2320.
[3] YANG Z,YANG D,DYER C,et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics,2016:1480-1489.
[4] CHEN G,YE D,XING Z,et al. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization[C]//Proceedings of 2017 International Joint Conference on Neural Networks. Piscataway:IEEE,2017:2377-2383.
[5] YANG P,SUN X,LI W,et al. SGM:sequence generation model for multi-label classification[C]//Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics,2018:3915-3926.
[6] QIN K,LI C,PAVLU V,et al. Adapting RNN sequence prediction model to multi-label set prediction[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2019:3181-3190.
[7] ZHOU W,YU Y,ZHANG M. Binary linear compression for multilabel classification[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. San Francisco:Morgan Kaufmann,2017:3546-3552.
[8] 胡天磊, 王皓波, 尹文栋. 基于深度双向分类器链的多标签新闻分类算法[J]. 浙江大学学报(工学版),2019,53(11):2110-2117. (HU T L,WANG H B,YIN W D. Multi-label news classification algorithm based on deep bi-directional classifier chains[J]. Journal of Zhejiang University(Engineering Science),2019,53(11):2110-2117.)
[9] OSOJNIK A,PANOV P,DŽEROSKI S. Multi-label classification via multi-target regression on data streams[J]. Machine Learning, 2017,106(6):745-770.
[10] 李兆玉, 王纪超, 雷曼, 等. 基于引力模型的多标签分类算法[J]. 计算机应用,2018,38(10):2807-2811,2821.(LI Z Y, WANG J C,LEI M,et al. Multi-label classification algorithm based on gravitational model[J]. Journal of Computer Applications,2018,38(10):2807-2811,2821.)
[11] 刘慧婷, 冷新杨, 王利利, 等. 联合嵌入式多标签分类算法[J]. 自动化学报,2019,45(10):1969-1982.(LIU H T,LENG X Y,WANG L L,et al. A joint embedded multi-label classification algorithm[J]. Acta Automatica Sinica,2019,45(10):1969-1982.)
[12] BAKER S,KORHONEN A. Initializing neural networks for hierarchical multi-label text classification[C]//Proceedings of the 2017 Conference on Biomedical Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2017:307-315.
[13] KURATA G,XIANG B,ZHOU B. Improved neural networkbased multi-label classification with better initialization leveraging label co-occurrence[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg,PA:Association for Computational Linguistics,2016:521-526.
[14] SHIMURA K,LI J,FUKUMOTO F. HFT-CNN:learning hierarchical category structure for multi-label short text categorization[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2018:811-816.
[15] YANG Y Y,LIN Y A,CHU H M,et al. Deep learning with a rethinking structure for multi-label classification[EB/OL].[2019-03-12]. https://arxiv.org/pdf/1802.01697.pdf.
[16] 宋攀, 景丽萍. 基于神经网络探究标签依赖关系的多标签分类[J]. 计算机研究与发展,2018,55(8):1751-1759. (SONG P, JING L P. Exploiting label relationships in multi-label classification with neural networks[J]. Journal of Computer Research and Development,2018,55(8):1751-1759.)
[17] LIU J,CHANG W C,WU Y,et al. Deep learning for extreme multi-label text classification[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM,2017:115-124.
[18] HE Z,YANG M,GAO Y,et al. Joint multi-label classification and label correlations with missing labels and feature selection[J]. Knowledge-Based Systems,2019,163:145-158.
[19] BANERJEE S,AKKAYA C,PEREZ-SORROSAL F,et al. Hierarchical transfer learning for multi-label text classification[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg,PA:Association for Computational Linguistics,2019:6295-6300.
[20] 熊涛. 基于长短时记忆网络的多标签文本分类[D]. 杭州:浙江大学,2017. (XIONG T. Multi-label text classification based on long short-term memory network[D]. Hangzhou:Zhejiang University,2017.)
[21] SABOUR S,FROSST N,HINTON G E. Dynamic routing between capsules[C]//Proceedings of 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:3856-3866.
[22] YANG M,ZHAO W,YE J,et al. Investigating capsule networks with dynamic routing for text classification[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2018:3110-3119.
[23] GONG J,QIU X,WANG S,et al. Information aggregation via dynamic routing for sequence encoding[C]//Proceedings of the 27th International Conference on Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2018:2742-2752.
[24] MARTINS A F T,ASTUDILLO R F. From softmax to sparsemax:a sparse model of attention and multi-label classification[C]//Proceedings of 33rd International Conference on Machine Learning. New York:JMLR.org,2016:1614-1623.

基于动态路由序列生成模型的多标签文本分类方法

Sequence generation model with dynamic routing for multi-label text classification

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	金泽熙, 李磊, 刘继. 基于改进领域分离网络的迁移学习模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2382-2389.
[2]	徐清海, 丁世飞, 孙统风, 张健, 郭丽丽. 改进的基于多路径特征的胶囊网络[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1330-1335.
[3]	孙浩, 曹健, 李海生, 毛典辉. 基于改进胶囊网络的会话型推荐模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1043-1049.
[4]	文凯, 薛晓, 季娟. 面向复杂图像分类的共享转换矩阵胶囊网络[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3411-3417.
[5]	陈恒, 王思懿, 李正光, 李冠宇, 刘鑫. 基于关系记忆的胶囊网络知识图谱嵌入模型[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 1985-1992.
[6]	董永峰, 孙跃华, 高立超, 韩鹏, 季海鹏. 基于改进一维卷积和双向长短期记忆神经网络的故障诊断方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1207-1215.
[7]	孟佳娜, 吕品, 于玉海, 孙世昶, 林鸿飞. 基于胶囊网络的方面级跨领域情感分析[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3700-3707.
[8]	吕学强, 彭郴, 张乐, 董志安, 游新冬. 融合BERT与标签语义注意力的文本多标签分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 57-63.
[9]	王小鹏, 孙媛媛, 林鸿飞. 基于刑事Electra的编-解码关系抽取模型[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 87-93.
[10]	刘乾, 王洪元, 曹亮, 孙博言, 肖宇, 张继. 基于联合损失胶囊网络的换衣行人重识别[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3596-3601.
[11]	高铭蔚, 桑楠, 杨茂林. 基于胶囊网络的交互式网络电视视频点播推荐模型[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3171-3177.
[12]	李凯, 岳秉杰. 共享转换矩阵的胶囊网络及其融合视角特征的跨视角步态识别[J]. 计算机应用, 2021, 41(1): 157-163.
[13]	杨云龙, 孙建强, 宋国超. 基于门控循环单元和胶囊特征的文本情感分析[J]. 计算机应用, 2020, 40(9): 2531-2535.
[14]	尹春勇, 何苗. 基于改进胶囊网络的文本分类[J]. 计算机应用, 2020, 40(9): 2525-2530.
[15]	胡甜甜, 但雅波, 胡杰, 李想, 李少波. 基于注意力机制的Bi-LSTM结合CRF的新闻命名实体识别及其情感分类[J]. 计算机应用, 2020, 40(7): 1879-1883.