基于特征选择和深度信念网络的文本情感分类算法

doi:10.11772/j.issn.1001-9081.2018112363

计算机应用 ›› 2019, Vol. 39 ›› Issue (7): 1942-1947.DOI: 10.11772/j.issn.1001-9081.2018112363

基于特征选择和深度信念网络的文本情感分类算法

向进勇^1,2, 杨文忠¹, 吾守尔·斯拉木^1,2

1. 新疆大学信息科学与工程学院, 乌鲁木齐 830046;
2. 新疆多语种信息技术重点实验室(新疆大学), 乌鲁木齐 830046

收稿日期:2018-11-28 修回日期:2018-12-28 出版日期:2019-07-10 发布日期:2019-07-15
通讯作者: 杨文忠
作者简介:向进勇(1992-),男,新疆伊犁人,硕士研究生,主要研究方向:文本情感分类;杨文忠(1973-),男,河南洛阳人,副教授,博士,CCF会员,主要研究方向:网络安全、文本情感分析;吾守尔·斯拉木(1942-),男,新疆伊犁人,中国工程院院士,教授,CCF会员,主要研究方向:自然语言处理。
基金资助:
国家自然科学基金资助项目（U1603115，XJEDU2017T002，U1435215）。

Text sentiment classification algorithm based on feature selection and deep belief network

XIANG Jinyong^1,2, YANG Wenzhong¹, SILAMU·Wushouer^1,2

1. School of Information Science and Engineering, Xinjiang University, Urumuqi Xinjiang 830046, China;
2. Xinjiang Laboratory of Multi-Language Information Technology, Xinjiang University, Urumqi Xinjiang 830046, China

Received:2018-11-28 Revised:2018-12-28 Online:2019-07-10 Published:2019-07-15
Supported by:
This work is partially supported by the National Natural Science Foundation of China (U1603115, XJEDU2017T002, U1435215).

摘要/Abstract

摘要：

由于人类语言的复杂性，文本情感分类算法大多都存在因为冗余而造成的词汇量过大的问题。深度信念网络（DBN）通过学习输入语料中的有用信息以及它的几个隐藏层来解决这个问题。然而对于大型应用程序来说，DBN是一个耗时且计算代价昂贵的算法。针对这个问题，提出了一种半监督的情感分类算法，即基于特征选择和深度信念网络的文本情感分类算法（FSDBN）。首先使用特征选择方法（文档频率（DF）、信息增益（IG）、卡方统计（CHI）、互信息（MI））过滤掉一些不相关的特征从而使词汇表的复杂性降低；然后将特征选择的结果输入到DBN中，使得DBN的学习阶段更加高效。将所提算法应用到中文以及维吾尔语中，实验结果表明在酒店评论数据集上，FSDBN在准确率方面比DBN提高了1.6%，在训练时间上比DBN缩短一半。

关键词: 深度信念网络, 深度学习, 特征选择, 半监督的情感分类算法, 受限波尔兹曼机, 文本情感分类

Abstract:

Because of the complexity of human language, text sentiment classification algorithms mostly have the problem of excessively huge vocabulary due to redundancy. Deep Belief Network (DBN) can solve this problem by learning useful information in the input corpus and its hidden layers. However, DBN is a time-consuming and computationally expensive algorithm for large applications. Aiming at this problem, a semi-supervised sentiment classification algorithm called text sentiment classification algorithm based on Feature Selection and Deep Belief Network (FSDBN) was proposed. Firstly, the feature selection methods including Document Frequency (DF), Information Gain (IG), CHI-square statistics (CHI) and Mutual Information (MI) were used to filter out some irrelevant features to reduce the complexity of vocabulary. Then, the results of feature selection were input into DBN to make the learning phase of DBN more efficient. The proposed algorithm was applied to Chinese and Uygur language. The experimental results on hotel review dataset show that the accuracy of FSDBN is 1.6% higher than that of DBN and the training time of FSDBN halves that of DBN.

Key words: Deep Belief Network (DBN), Deep Learning (DL), Feature Selection (FS), semi-supervised sentiment classification algorithm, Restricted Boltzmann Machine (RBM), text sentiment classification

中图分类号:

TP391.1

向进勇, 杨文忠, 吾守尔·斯拉木. 基于特征选择和深度信念网络的文本情感分类算法[J]. 计算机应用, 2019, 39(7): 1942-1947.

XIANG Jinyong, YANG Wenzhong, SILAMU·Wushouer. Text sentiment classification algorithm based on feature selection and deep belief network[J]. Journal of Computer Applications, 2019, 39(7): 1942-1947.

参考文献

[1] HINTON G, OSINDERO S, TEH Y. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[2] ZHOU S, CHEN Q, WANG X, et al. Hybrid deep belief networks for semi-supervised sentiment classification[C]//Proceeding of the 201425th International Conference on Computational Linguistic. Stroudsburg, PA:Association for Computational Linguistics, 2014:1341-1349.
[3] ZHOU S, CHEN Q, WANG X. Active deep networks for semisupervised sentiment classification.[C]//Proceedings of the 201023rd International Conference on Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2010:1515-1523.
[4] SOCHER R, PERELYGIN A, WU J J. et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 International Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2013:1631-1642.
[5] DASGUPTA, S, NG V. Mine the easy, classify the hard:a semisupervised approach to automatic sentiment classification[C]//Proceedings of the 200947th International Conference on Annual Meeting of the Association for Computational Linguistics and Proceedings of the 2009/4th International Joint Conference on Natural Language of the Asian Federation of Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2009:701-709.
[6] PANG B, LEE L, VAITHYANATHAN S. Thumbs up?:sentiment classification using machine learning techniques[C]//Proceedings of the 2002 International Conference on Association for Computational Linguistics on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2002:79-86.
[7] FORMAN G. An extensive empirical study of feature selection metrics for text classification[J]. The Journal of Machine Learning Research, 2003:1289-1305.
[8] YANG Y, PEDERSEN J O. A comparative study on feature selection in text categorization[C]//Proceedings of the 199714th International Conference on Machine Learning. San Francisco, CA:Morgan Kaufmann, 1997:412-420.
[9] 周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23.(ZHOU Q, ZHAO M S, HU M. Research on feature selection in Chinese text classification[J]. Journal of Chinese Information Processing, 2004, 18(3):17-23.)
[10] 吴金源,冀俊忠,赵学武,等.基于特征选择技术的情感词权重计算[J].北京工业大学学报,2016,42(1):142-151.(WU J Y, JI J Z, ZHAO X W, et al. Weight calculation of affective words based on feature selection technique[J]. Journal of Beijing University of Technology, 2016, 42(1):142-151.)
[11] 周爱武,马那那,刘慧婷.基于卡方统计的情感文本分类[J].微电子学与计算机,2017,34(8):57-61.(ZHOU A W, MA N N, LIU H T. Emotional text classification based on chi-square statistics[J]. Microelectronics and Computer, 2017, 34(8):57-61.)
[12] 裴英博,刘晓霞.文本分类中改进型CHI特征选择方法的研究[J].计算机工程与应用,2011,47(4):128-130.(PEI Y B, LIU X X. Research on improved CHI feature selection method in text classification[J]. Computer Engineering and Application, 2011, 47(4):128-130.)
[13] BAGHERI A, SARAEE M, de JONG F. Sentiment classification in Persian:introducing a mutual information-based method for feature selection[C]//Proceedings of the 201321th International Conference on Electrical Engineering. Piscataway, NJ:IEEE, 2013:1-6.
[14] BLIZER J, DREDZE M, PEREIRA F. Biographies, bollywood, boomboxes and blenders:domain adaptation for sentiment classification[C]//Proceedings of the 2007 International Conference on Association for Computational Linguistic. Stroudsburg, PA:Association for Computational Linguistics, 2007:440-447.
[15] LOPES N, RIBEIRO B, GONÇALVES J. Restricted Boltzmann machines and deep belief networks on multi-core processors[C]//Proceedings of the 2012 International Joint Conference on Neural Networks Piscataway, NJ:IEEE, 2012:1-7.
[16] 张庆庆,刘西林.基于深度信念网络的文本情感分类研究[J].西北工业大学学报(社会科学版),2016,36(1):62-66.(ZHANG Q Q, LIU X L. Research on text emotion classification based on deep belief network[J]. Journal of Northwest Polytechnic University (Social Science Edition), 2016, 36(1):62-66.)
[17] 伊尔夏提·吐尔贡,吾守尔·斯拉木,热西旦木·吐尔洪太,等.维吾尔文情感语料库的构建与分析[J].计算机与现代化,2017(4):67-72.(TUERGONG Y, SILAMU W, TUSERHONGTAI R, et al. Construction and analysis of Uighur affective corpus[J]. Computer and Modernization, 2017(4):67-72.)
[18] KAMVAR S D, DAN K, MANNING C D. Spectral learning[C]//Proceedings of the 2003 International Joint Conference on Artificial Intelligence. San Francisco, CA:Morgan Kaufmann, 2003:561-566.
[19] COLLOBERT R, SINZ F, WESTON J, et al. Large scale transductive SVMs[J]. The Journal of Machine Learning Research, 2006, 7:1687-1712.
[20] LI S, HUANG C R, ZHOU G, et al. Employing personal/impersonal views in supervised and semi-supervised sentiment classification[C]//Proceedings of the 201048th International Joint Conference on Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2010:414-423.
[21] RUANGKANOKMAS P, ACHALAKUL T, AKKARAJITSAKUL K. Deep belief networks with feature selection for sentiment classification[C]//Proceedings of the 201748th International Conference on Intelligent Systems, Modelling and Simulation. Piscataway, NJ:IEEE, 2017:9-14.

基于特征选择和深度信念网络的文本情感分类算法

Text sentiment classification algorithm based on feature selection and deep belief network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[2]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[3]	陈成瑞, 孙宁, 何世彪, 廖勇. 面向C-V2X通信的基于深度学习的联合信道估计与均衡算法[J]. 计算机应用, 2021, 41(9): 2687-2693.
[4]	湛航, 何朗, 黄樟灿, 李华峰, 张蔷, 谈庆. 改进的基于层次距离的基因表达式编程特征选择分类算法[J]. 计算机应用, 2021, 41(9): 2658-2667.
[5]	郑志强, 胡鑫, 翁智, 王雨禾, 程曦. 基于改进DenseNet的牛眼图像特征提取方法[J]. 计算机应用, 2021, 41(9): 2780-2784.
[6]	谢德峰, 吉建民. 融入句法感知表示进行句法增强的语义解析[J]. 计算机应用, 2021, 41(9): 2489-2495.
[7]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[8]	祝承, 赵晓琦, 赵丽萍, 焦玉宏, 朱亚飞, 陈建英, 周伟, 谭颖. 基于谱聚类半监督特征选择的功能磁共振成像数据分类[J]. 计算机应用, 2021, 41(8): 2288-2293.
[9]	李蒙蒙, 秦伟, 刘艺, 刁兴春. 结合头脑风暴优化的混合蚁群优化算法[J]. 计算机应用, 2021, 41(8): 2412-2417.
[10]	何正海, 线岩团, 王蒙, 余正涛. 融合句法指导与字符注意力机制的案情阅读理解方法[J]. 计算机应用, 2021, 41(8): 2427-2431.
[11]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[12]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[13]	李亚芳, 梁烨, 冯韦玮, 祖宝开, 康玉健. 基于社区优化的深度网络嵌入方法[J]. 计算机应用, 2021, 41(7): 1956-1963.
[14]	杜炎, 吕良福, 焦一辰. 基于模糊推理的模糊原型网络[J]. 计算机应用, 2021, 41(7): 1885-1890.
[15]	王月, 江逸茗, 兰巨龙. 基于改进三元组网络和K近邻算法的入侵检测[J]. 计算机应用, 2021, 41(7): 1996-2002.