结合注意力机制的长文本分类方法

doi:10.11772/j.issn.1001-9081.2017112652

计算机应用 ›› 2018, Vol. 38 ›› Issue (5): 1272-1277.DOI: 10.11772/j.issn.1001-9081.2017112652

结合注意力机制的长文本分类方法

卢玲, 杨武, 王远伦, 雷子鉴, 李莹

重庆理工大学计算机科学与工程学院, 重庆 400050

收稿日期:2017-11-07 修回日期:2017-12-04 出版日期:2018-05-10 发布日期:2018-05-24
通讯作者: 杨武
作者简介:卢玲(1975-),女,重庆人,副教授,硕士,CCF会员,主要研究方向:机器学习、自然语言处理;杨武(1965-),男,重庆人,教授,硕士,CCF会员,主要研究方向:机器学习、信息检索;王远伦(1996-),男,重庆人,主要研究方向:机器学习、信息检索;雷子鉴(1997-),男,重庆人,主要研究方向:机器学习、信息检索;李莹(1997-),女,重庆人,主要研究方向:机器学习、信息检索。
基金资助:
国家社科基金西部项目（17XXW005）；重庆市教委科学技术研究项目（KJ1500903）。

Long text classification combined with attention mechanism

LU Ling, YANG Wu, WANG Yuanlun, LEI Zijian, LI Ying

College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400050, China

Received:2017-11-07 Revised:2017-12-04 Online:2018-05-10 Published:2018-05-24
Contact: 杨武
Supported by:
This work is partially supported by the West Project of the National Social Science Foundation of China (17XXW005), the Scientific and Technological Research Project of Chongqing Municipal Education Commission(KJ1500903).

摘要/Abstract

摘要： 新闻文本常包含几十至几百条句子，因字符数多、包含较多与主题无关信息，影响分类性能。对此，提出了结合注意力机制的长文本分类方法。首先将文本的句子表示为段落向量，再构建段落向量与文本类别的神经网络注意力模型，用于计算句子的注意力，将句子注意力的均方差作为其对类别的贡献度，进行句子过滤，然后构建卷积神经网络（CNN）分类模型，分别将过滤后的文本及其注意力矩阵作为网络输入。模型用max pooling进行特征过滤，用随机dropout防止过拟合。实验在自然语言处理与中文计算（NLP&CC）评测2014的新闻分类数据集上进行。当过滤文本长度为过滤前文本的82.74%时，19类新闻的分类正确率为80.39%，比过滤前文本的分类正确率超出2.1%，表明结合注意力机制的句子过滤方法及分类模型，可在句子级信息过滤的同时提高长文本分类正确率。

关键词: 注意力机制, 卷积神经网络, 段落向量, 信息过滤, 文本分类

Abstract: News text usually consists of tens to hundreds of sentences, which has a large number of characters and contains more information that is not relevant to the topic, affecting the classification performance. In view of the problem, a long text classification method combined with attention mechanism was proposed. Firstly, a sentence was represented by a paragraph vector, and then a neural network attention model of paragraph vectors and text categories was constructed to calculate the sentence's attention. Then the sentence was filtered according to its contribution to the category, which value was mean square error of sentence attention vector. Finally, a classifier base on Convolutional Neural Network (CNN) was constructed. The filtered text and the attention matrix were respectively taken as the network input. Max pooling was used for feature filtering. Random dropout was used to reduce over-fitting. Experiments were conducted on data set of Chinese news text classification task, which was one of the shared tasks in Natural Language Processing and Chinese Computing (NLP&CC) 2014. The proposed method achieved 80.39% in terms of accuracy for the filtered text, which length was 82.74% of the text before filtering, yielded an accuracy improvement of considerable 2.1% compared to text before filtering. The emperimental results show that combining with attention mechanism, the proposed method can improve accuracy of long text classification while achieving sentence level information filtering.

Key words: attention mechanism, Convolutional Neural Network (CNN), Paragraph Vector (PV), information filtering, text classification

中图分类号:

TP391.1

卢玲, 杨武, 王远伦, 雷子鉴, 李莹. 结合注意力机制的长文本分类方法[J]. 计算机应用, 2018, 38(5): 1272-1277.

LU Ling, YANG Wu, WANG Yuanlun, LEI Zijian, LI Ying. Long text classification combined with attention mechanism[J]. Journal of Computer Applications, 2018, 38(5): 1272-1277.

参考文献

[1] JOACHIMS T. Text categorization with support vector machines:learning with many relevant features[C]//Proceedings of the 10th European Conference on Machine Learning. London:Springer-Verlag, 1998:137-142.
[2] PANG B, LEE L, VAITHYANATHAN S. Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2002:79-86.
[3] KIM Y. Convolutional neural networks for sentence classification[C]//EMNLP 2014:Proceedings of the 2014 Conference of Empirical Methods in Natural Language Processing. New York:ACM, 2014:1746-1751.
[4] ZHOU C T, SUN C L, LIU Z Y, et al. A C-LSTM neural network for text classification[EB/OL].[2017-06-20]. https://arxiv.org/pdf/1511.08630.pdf.
[5] 苏新宁,杨建林, 江念南,等. 数据仓库和数据挖掘[M]. 北京:清华大学出版社, 2006:199-200.(SU X N, YANG J L, JIANG N N, et al. Information Storage and Retrieval Systems[M]. Beijing:Tsinghua University Press, 2006:199-200.)
[6] YANG Y, PEDERSON J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning. San Francisco, CA:Morgan Kaufmann Publishers, 1997:412-420.
[7] 孙丽华,张积东,李静梅. 一种改进的KNN方法及其在文本分类中的应用[J].应用科技,2002,29(2):25-27. (SUN L H, ZHANG J D, LI J M. An improved K-nearest neighbor system and its application to text classification[J]. Applied Science and Technology, 2002, 29(2):25-27.)
[8] 朱寰,阮彤,于庆喜.文本分割算法对中文信息过滤影响研究[J].计算机工程与应用,38(13),2002:62-65. (ZHU H, RUAN T, YU Q X. Studies on text segment algorithms' influence on Chinese-based information filtering[J]. Computer Engineering and Applications, 2002, 38(13):62-65.).
[9] 何新贵,彭甫阳.中文文本的关键词自动抽取和模糊分类[J].中文信息学报,1999,13(1):9-15. (HE X G, PENG F Y. Fuzzy classification of Chinese texts[J]. Journal of Chinese Information Processing, 1999,13(1):9-15.)
[10] BENGIO Y, DUCHARME R, VINCENT P, et al. Neural probabilistic language model[J]. Journal of Machine Learning Research, 2003, 2006, 3(6):1137-1155.
[11] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].[2017-06-20].https://arxiv.org/pdf/1301.3781.pdf.
[12] MIKOLOV T, YIH W, ZWEIG G. Linguistic regularities in continuous space word representations[EB/OL].[2017-06-20].https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/rvecs.pdf.
[13] MIKOLOV T, SUTSKEVER I, CHEN K. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe:Curran Associates Inc., 2013:91-100.
[14] MITCHELL J, LAPATA M. Composition in distributional models of semantics[J]. Cognitive Science, 2010, 34(8):1388-1429.
[15] SOCHER R, LIN C C, NG A Y, et al. Parsing natural scenes and natural language with recursive neural networks[C]//ICML 2011:Proceedings of the 28th International Conference on Machine Learning. Bellevue, Washington:OMNI Press, 2011:129-136.
[16] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of the 31st International Conference on International Conference on Machine Learning. Cambridge, MA:MIT Press, 2013:1188-1196.
[17] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//NIPS 2014:Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2014:2204-2212.
[18] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2017-06-20].http://nlp.ict.ac.cn/Admin/kindeditor/attached/file/20141011/20141011133445_31922.pdf.
[19] CHEN H M, SUN M S, TU C C, et al. Neural sentiment classification with user and product attention[C]//ACL2016:Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas:Association for Computational Linguistics, 2016:1650-1659.
[20] HERMANN K M, KOCISKY T, GREFENSTETTE E, et al. Teaching machines to read and comprehend[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2015:1693-1701.
[21] HE R, LEE W S, NG H T, et al. An unsupervised neural attention model for aspect extraction[EB/OL].[2017-06-20].http://www.comp.nus.edu.sg/~leews/publications/acl17.pdf.
[22] YIN W, SCHUTZE H, XIANG B, et al. ABCNN:attention-based convolutional neural network for modeling sentence pairs[EB/OL].[2017-06-20]. http://www.aclweb.org/anthology/Q/Q16/Q16-1019.pdf.

结合注意力机制的长文本分类方法

Long text classification combined with attention mechanism

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[2]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[3]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[4]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[5]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[6]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[7]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[8]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[9]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[10]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.
[11]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.
[12]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[13]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[14]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.
[15]	武维, 李泽平, 杨华蔚, 林川, 王忠德. 融合内容特征和时序信息的深度注意力视频流行度预测模型[J]. 计算机应用, 2021, 41(7): 1878-1884.