基于分层注意力机制的神经网络垃圾评论检测模型

doi:10.11772/j.issn.1001-9081.2018041356

计算机应用 ›› 2018, Vol. 38 ›› Issue (11): 3063-3068.DOI: 10.11772/j.issn.1001-9081.2018041356

• 第七届中国数据挖掘会议(CCDM 2018) • 上一篇下一篇

基于分层注意力机制的神经网络垃圾评论检测模型

刘雨心¹, 王莉², 张昊¹

1. 太原理工大学信息与计算机学院, 山西晋中 030600;
2. 太原理工大学大数据学院, 山西晋中 030600

收稿日期:2018-04-30 修回日期:2018-06-26 出版日期:2018-11-10 发布日期:2018-11-10
通讯作者: 王莉
作者简介:刘雨心(1984-),女,山西太原人,博士研究生,主要研究方向:数据挖掘、机器学习、深度学习;王莉(1971-),女,山西太原人,教授,博士,主要研究方向:大数据计算与分析、知识图谱、数据挖掘、人工智能;张昊(1988-),男,山西太原人,讲师,博士,主要研究方向:复杂网络。
基金资助:
国家863计划项目（2014AA015204）；国家自然科学基金资助项目（61702356）；山西省自然科学基金资助项目（201703D421013）；中国科学院计算技术研究所网络数据科学重点实验室课题（CASNDST20140X）。

Hierarchical attention-based neural network model for spam review detection

LIU Yuxin¹, WANG Li², ZHANG Hao¹

1. College of Information and Computer, Taiyuan University of Technology, Jinzhong Shanxi 030600, China;
2. College of Data Science, Taiyuan University of Technology, Jinzhong Shanxi 030600, China

Received:2018-04-30 Revised:2018-06-26 Online:2018-11-10 Published:2018-11-10
Supported by:
This work is partially supported by the National High Technology Research and Development Program (2014AA015204), the National Natural Science Foundation of China (61702356), the Natural Science Foundation of Shanxi Province (201703D421013), the Key Laboratory Project of Network Data Science and Technology in the Institute of Computing Technology, Chinese Academy of Sciences (CASNDST20140X).

摘要/Abstract

摘要： 针对现有垃圾评论识别方法很难揭示用户评论的潜在语义信息这一问题，提出一种基于层次注意力的神经网络检测（HANN）模型。该模型主要由以下两部分组成：Word2Sent层，在词向量表示的基础上，采用卷积神经网络（CNN）生成连续的句子表示；Sent2Doc层，基于上一层产生的句子表示，使用注意力池化的神经网络生成文档表示。生成的文档表示直接作为垃圾评论的最终特征，采用softmax分类器分类。此模型通过完整地保留评论的位置和强度特征，并从中提取重要的和综合的信息（文档任何位置的历史、未来和局部上下文），挖掘用户评论的潜在语义信息，从而提高垃圾评论检测准确率。实验结果表明，与仅基于神经网络的方法相比，该模型准确率平均提高5%，分类效果显著改善。

关键词: 垃圾评论, 表示学习, 注意力机制, 卷积神经网络, 双向长短时记忆

Abstract: Existing measures to detect spam reviews mainly focus on designing features from the perspective of linguistic and psychological clues, which hardly reveal the latent semantic information of the reviews. A Hierarchical Attention-based Neural Network (HANN) model was proposed to mine latent semantic information. The model mainly consisted of the following two layers:the Word2Sent layer, which used a Convolutional Neural Network (CNN) to produce continuous sentence representations on the basis of word embedding, and the Sent2Doc layer, which utilized an attention pooling-based neural network to generate document representations on the basis of sentence representations. The generated document representations were directly employed as features to identify spam reviews. The proposed hierarchical attention mechanism enables our model to preserve position and intensity information completely. Thus, the comprehensive information, history, future, and local context of any position in a document can be extracted. The experimental results show that our method can achieve higher accuracy, compared with neural network-based methods only, the accuracy is increased by 5% on average, and the classification effect is improved significantly.

Key words: spam review, representation learning, attention mechanism, Convolutional Neural Network (CNN), Bidirectional Long-short Term Memory (BLSTM)

中图分类号:

TP183

刘雨心, 王莉, 张昊. 基于分层注意力机制的神经网络垃圾评论检测模型[J]. 计算机应用, 2018, 38(11): 3063-3068.

LIU Yuxin, WANG Li, ZHANG Hao. Hierarchical attention-based neural network model for spam review detection[J]. Journal of Computer Applications, 2018, 38(11): 3063-3068.

参考文献

[1] SANTOSH K C, MUKHERJEE A. On the temporal dynamics of opinion spamming:case studies on Yelp[C]//Proceedings of the 25th International Conference on World Wide Web. Montréal, Québec:[s.n.], 2016:369-379.
[2] 林煜明, 王晓玲, 朱涛, 等.用户评论的质量检测与控制研究综述[J].软件学报, 2014,25(3):506-527.(LIN Y M,WANG X L,ZHU T, et al. A review of research on quality inspection and control of user comments[J]. Journal of Software, 2014, 25(3):506-527.)
[3] JINDAL N, LIU B. Analyzing and detecting review spam[C]//Proceedings of the 7th IEEE International Conference on Data Mining. Washington, DC:IEEE Computer Society, 2007:547-552.
[4] 莫倩, 杨珂.网络水军识别研究[J].软件学报, 2014, 25(7):1505-1526.(MO Q, YANG K. Overview of Web spammer detection[J].Journal of Software, 2014,25(7):1505-1526.)
[5] OTT M, CHOI Y, CARDIE C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2011:309-319.
[6] JINDAL N, LIU B. Opinion spam and analysis[C]//Proceedings of the 2008 International Conference on Web Search and Data Mining. New York:ACM, 2008:219-230.
[7] REN Y F, JI D H. Neural networks for deceptive opinion spam detection:an empirical study[J].Information Sciences, 2017, 385:213-224.
[8] MENG J E, ZHANG Y, WANG N, et al. Attention pooling-based convolutional neural network for sentence modelling[J].Information Sciences, 2016,373(C):388-403.
[9] LI J, OTT M, CARDIE C, et al. Towards a general rule for identifying deceptive opinion spam[EB/OL].[2018-03-20].http://www.aclweb.org/anthology/P/P14/P14-1147.pdf.
[10] LI L Y, QIN B, REN W J, et al. Document representation and feature combination for deceptive spam review detection[J].Neurocomputing 2017,254:33-41.
[11] WU Y, FENG G, WANG N, et al. Game of information security investment:impact of attack types and network vulnerability[J].Expert Systems with Applications, 2015, 42(15/16):6132-6146.
[12] FDEZ-GLEZ J, RUANO-ORDAS D, MÉNDEZ J R. A dynamic model for integrating simple Web spam classification techniques[J].Expert Systems with Applications, 2015,42(21):7969-7978.
[13] GOH K L, SINGH A K. Comprehensive literature review on machine learning structures for Web spam classification[J]. Procedia Computer Science, 2015,70:434-441.
[14] JINDAL N, LIU B, LIM E P. Finding unusual review patterns using unexpected rules[C]//Proceedings of the 2010 International Conference on Information and Knowledge Management. New York:ACM,2010:1549-1552.
[15] HEYDARI A, TAVAKOLI M, SALIM N. A framework for review spam detection research[EB/OL].[2018-03-20].https://pdfs.semanticscholar.org/46a9/74b6a2fe378a366432ac535cf25c9f32d773.pdf.
[16] LAU R Y K, LIAO S Y, KWOK C W, et al. Text mining and probabilistic language modeling for online review spam detection[J]. ACM Transactions on Management Information Systems, 2012,2(4):1-30.
[17] PENG Q, ZHONG M. Detecting spam review through sentiment analysis[J].Journal of Software, 2014, 9(8):2065.
[18] TANG D, WEI F, YANG N, et al. Learning sentiment-specific word embedding for twitter sentiment classification[EB/OL].[2018-03-20].http://ir.hit.edu.cn/~dytang/paper/sswe/acl-slides.pdf.
[19] 唐晓波, 朱娟, 杨丰华.基于情感本体和kNN算法的在线评论情感分类研究[J].情报理论与实践,2016(6):110-114.(TANG X B, ZHU J, YANG F H. Research on online comment emotion classification based on emotion ontology and kNN algorithm[J]. Information Studies:Theory & Application, 2016(6):110-114.)
[20] CRAWFORD M, KHOSHGOFTAAR T M, PRUSA J D, et al. Survey of review spam detection using machine learning techniques[J].Journal of Big Data, 2015, 2(1):23.
[21] ESHRAQI N, JALALI M, MOATTAR M H. Spam detection in social networks:a review[C]//Proceedings of the 20152nd International Congress on Technology, Communication and Knowledge. Piscataway, NJ:IEEE, 2015:148-152.
[22] YOO K H, GRETZEL U. Comparison of deceptive and truthful travel reviews[C]//Proceedings of the 2009 International Conference on Information and Communication Technology. Berlin:Springer, 2009:37-47.
[23] OTT M, CARDIE C, HANCOCK J T. Negative deceptive opinion spam[EB/OL].[2018-03-20].http://www.cs.cornell.edu/Info/People/cardie/papers/NAACL13-Negative.pdf.
[24] LIN Y, ZHU T, WU H, et al. Towards online anti-opinion spam:spotting fake reviews from the review sequence[C]//Proceedings of the 2014 International Conference on Advances in Social Networks Analysis and Mining. Washington, DC:IEEE Computer Society, 2014:261-264.
[25] HEYDARI A, TAVAKOLI M, SALIM N. Detection of fake opinions using time series[J].Expert Systems with Applications, 2016,58(C):83-92.
[26] AHSAN M N I, NAHIAN T, KAFI A A, et al. Review spam detection using active learning[C]//Proceedings of the 2016 IEEE 7th Annual Information Technology, Electronics and Mobile Communication Conference. Piscataway, NJ:IEEE, 2016:1-7.
[27] ZHANG W, BU C Q, YOSHIDA T, et al. CoFea:a novel approach to spam review identification based on entropy and co-training[J]. Entropy, 2016, 18(12):429.
[28] 何珑.基于随机森林的产品垃圾评论识别[J].中文信息学报, 2015,29(3):150-154.(HE L. Identification of product review spam by random forest[J]. Journal of Chinese Information Processing, 2015,29(3):150-154.)
[29] WANG Z, HOU T, SONG D, et al. Detecting review spammer groups via bipartite graph projection[J].Computer Journal, 2016, 59(6):bxv068.
[30] YANG Z, YANG D, DYER C, et al. Hierarchical attention networks for document classification[EB/OL].[2018-03-20].http://www.aclweb.org/anthology/N/N16/N16-1174.pdf.
[31] KIM Y. Convolutional neural networks for sentence classification[J/OL]. arXiv Preprint, 2014, 2014:arXiv:1408.5882(2014-08-05)[2014-09-03]. https://arxiv.org/abs/1408.5882.
[32] KALCHBRENNER N, GREFENSTETTE E, BLUNSOM P. A convolutional neural network for modelling sentences[EB/OL].[2018-03-20].http://mirror.aclweb.org/acl2014/P14-1/pdf/P14-1062.pdf.
[33] SANTOS C N D, GATTIT M. Deep convolutional neural networks for sentiment analysis of short texts[EB/OL].[2018-03-20].http://aclweb.org/anthology/C/C14/C14-1008.pdf.
[34] REN Y, ZHANG Y, ZHANG M, et al. Improving Twitter sentiment classification using topic-enriched multi-prototype word embeddings[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence. Menlo Park, CA:AAAI Press, 2016:3038-3044.
[35] REN Y, ZHANG Y, ZHANG M, et al. Context-sensitive twitter sentiment classification using neural network[C]//Proceedings of the 30th AAAI Conference on Artifical Intelligence. Menlo Park, CA:AAAI Press, 2016:215-221.
[36] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J].Neural Networks, 2005, 18(5/6):602-610.
[37] MAAS A L, HANNUN A Y, NG A Y. Rectifier nonlinearities improve neural network acoustic models[EB/OL].[2018-03-20].http://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf.
[38] SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout:a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014,15(1):1929-1958.
[39] ZEILER M D. ADADELTA:an adaptive learning rate method[EB/OL].[2018-03-20].http://www.matthewzeiler.com/wp-content/uploads/2017/07/googleTR2012.pdf.
[40] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].[2018-03-20].http://seed.ucsd.edu/mediawiki/images/e/e3/Wordembeddings.pdf.

基于分层注意力机制的神经网络垃圾评论检测模型

Hierarchical attention-based neural network model for spam review detection

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[2]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[3]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[4]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[5]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[6]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[7]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[8]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[9]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[10]	王梓森, 梁英, 刘政君, 谢小杰, 张伟, 史红周. 科研项目同行评议专家学术专长匹配方法[J]. 计算机应用, 2021, 41(8): 2418-2426.
[11]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.
[12]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[13]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[14]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.
[15]	曾祥银, 郑伯川, 刘丹. 基于深度卷积神经网络和聚类的左右轨道线检测[J]. 计算机应用, 2021, 41(8): 2324-2329.