计算机应用 ›› 2019, Vol. 39 ›› Issue (7): 1942-1947.DOI: 10.11772/j.issn.1001-9081.2018112363

• 人工智能 • 上一篇    下一篇

基于特征选择和深度信念网络的文本情感分类算法

向进勇1,2, 杨文忠1, 吾守尔·斯拉木1,2   

  1. 1. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046;
    2. 新疆多语种信息技术重点实验室(新疆大学), 乌鲁木齐 830046
  • 收稿日期:2018-11-28 修回日期:2018-12-28 出版日期:2019-07-10 发布日期:2019-07-15
  • 通讯作者: 杨文忠
  • 作者简介:向进勇(1992-),男,新疆伊犁人,硕士研究生,主要研究方向:文本情感分类;杨文忠(1973-),男,河南洛阳人,副教授,博士,CCF会员,主要研究方向:网络安全、文本情感分析;吾守尔·斯拉木(1942-),男,新疆伊犁人,中国工程院院士,教授,CCF会员,主要研究方向:自然语言处理。
  • 基金资助:

    国家自然科学基金资助项目(U1603115,XJEDU2017T002,U1435215)。

Text sentiment classification algorithm based on feature selection and deep belief network

XIANG Jinyong1,2, YANG Wenzhong1, SILAMU·Wushouer1,2   

  1. 1. School of Information Science and Engineering, Xinjiang University, Urumuqi Xinjiang 830046, China;
    2. Xinjiang Laboratory of Multi-Language Information Technology, Xinjiang University, Urumqi Xinjiang 830046, China
  • Received:2018-11-28 Revised:2018-12-28 Online:2019-07-10 Published:2019-07-15
  • Supported by:

    This work is partially supported by the National Natural Science Foundation of China (U1603115, XJEDU2017T002, U1435215).

摘要:

由于人类语言的复杂性,文本情感分类算法大多都存在因为冗余而造成的词汇量过大的问题。深度信念网络(DBN)通过学习输入语料中的有用信息以及它的几个隐藏层来解决这个问题。然而对于大型应用程序来说,DBN是一个耗时且计算代价昂贵的算法。针对这个问题,提出了一种半监督的情感分类算法,即基于特征选择和深度信念网络的文本情感分类算法(FSDBN)。首先使用特征选择方法(文档频率(DF)、信息增益(IG)、卡方统计(CHI)、互信息(MI))过滤掉一些不相关的特征从而使词汇表的复杂性降低;然后将特征选择的结果输入到DBN中,使得DBN的学习阶段更加高效。将所提算法应用到中文以及维吾尔语中,实验结果表明在酒店评论数据集上,FSDBN在准确率方面比DBN提高了1.6%,在训练时间上比DBN缩短一半。

关键词: 深度信念网络, 深度学习, 特征选择, 半监督的情感分类算法, 受限波尔兹曼机, 文本情感分类

Abstract:

Because of the complexity of human language, text sentiment classification algorithms mostly have the problem of excessively huge vocabulary due to redundancy. Deep Belief Network (DBN) can solve this problem by learning useful information in the input corpus and its hidden layers. However, DBN is a time-consuming and computationally expensive algorithm for large applications. Aiming at this problem, a semi-supervised sentiment classification algorithm called text sentiment classification algorithm based on Feature Selection and Deep Belief Network (FSDBN) was proposed. Firstly, the feature selection methods including Document Frequency (DF), Information Gain (IG), CHI-square statistics (CHI) and Mutual Information (MI) were used to filter out some irrelevant features to reduce the complexity of vocabulary. Then, the results of feature selection were input into DBN to make the learning phase of DBN more efficient. The proposed algorithm was applied to Chinese and Uygur language. The experimental results on hotel review dataset show that the accuracy of FSDBN is 1.6% higher than that of DBN and the training time of FSDBN halves that of DBN.

Key words: Deep Belief Network (DBN), Deep Learning (DL), Feature Selection (FS), semi-supervised sentiment classification algorithm, Restricted Boltzmann Machine (RBM), text sentiment classification

中图分类号: