基于层次注意力机制神经网络模型的虚假评论识别

doi:10.11772/j.issn.1001-9081.2018112340

计算机应用 ›› 2019, Vol. 39 ›› Issue (7): 1925-1930.DOI: 10.11772/j.issn.1001-9081.2018112340

基于层次注意力机制神经网络模型的虚假评论识别

颜梦香¹, 姬东鸿¹, 任亚峰²

1. 武汉大学国家网络安全学院, 武汉 430072;
2. 广东外语外贸大学外语研究与语言服务协同创新中心, 广州 510420

收稿日期:2018-11-26 修回日期:2019-02-16 发布日期:2019-07-15 出版日期:2019-07-10
通讯作者: 任亚峰
作者简介:颜梦香(1993-),女,湖北孝感人,硕士研究生,主要研究方向:自然语言处理;姬东鸿(1967-),男,河南驻马店人,教授,博士生导师,博士,主要研究方向:自然语言处理、机器学习、数据挖掘;任亚峰(1986-),男,河南焦作人,副研究员,博士,主要研究方向:自然语言处理、机器学习。
基金资助:
国家自然科学基金资助项目（61702121，61772378）。

Deceptive review detection via hierarchical neural network model with attention mechanism

YAN Mengxiang¹, JI Donghong¹, REN Yafeng²

1. School of Cyber Science and Engineering, Wuhan University, Wuhan Hubei 430072, China;
2. Collaborative Innovation Center for Language Research and Service, Guangdong University of Foreign Studies, Guangzhou Guangdong 510420, China

Received:2018-11-26 Revised:2019-02-16 Online:2019-07-15 Published:2019-07-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61702121, 61772378).

摘要/Abstract

摘要：

针对虚假评论识别任务中传统离散模型难以捕捉到整个评论文本的全局语义信息的问题，提出了一种基于层次注意力机制的神经网络模型。首先，采用不同的神经网络模型对评论文本的篇章结构进行建模，探讨哪种神经网络模型能够获得最好的篇章表示；然后，基于用户视图和产品视图的两种注意力机制对评论文本进行建模，用户视图关注评论文本中用户的偏好，而产品视图关注评论文本中产品的特征；最后，将两个视图学习的评论表示拼接以作为预测虚假评论的最终表示。以准确率作为评估指标，在Yelp数据集上进行了实验。实验结果表明，所提出的层次注意力机制的神经网络模型表现最好，其准确率超出了传统离散模型和现有的神经网络基准模型1至4个百分点。

关键词: 注意力机制, 虚假评论, 离散特性, 神经网络, 长短期记忆网络

Abstract:

Concerning the problem that traditional discrete models fail to capture global semantic information of whole comment text in deceptive review detection, a hierarchical neural network model with attention mechanism was proposed. Firstly, different neural network models were adopted to model the structure of text, and which model was able to obtain the best semantic representation was discussed. Then, the review was modeled by two attention mechanisms respectively based on user view and product view. The user view focused on the user's preferences in comment text and the product view focused on the product feature in comment text. Finally, two representations learned from user and product views were combined as final semantic representation for deceptive review detection. The experiments were carried out on Yelp dataset with accuracy as the evaluation indicator. The experimental results show that the proposed hierarchical neural network model with attention mechanism performs the best with the accuracy higher than traditional discrete methods and existing neural benchmark models by 1 to 4 percentage points.

Key words: attention mechanism, deceptive review, discrete feature, neural network, Long Short-Term Memory (LSTM) network

中图分类号:

TP181

颜梦香, 姬东鸿, 任亚峰. 基于层次注意力机制神经网络模型的虚假评论识别[J]. 计算机应用, 2019, 39(7): 1925-1930.

YAN Mengxiang, JI Donghong, REN Yafeng. Deceptive review detection via hierarchical neural network model with attention mechanism[J]. Journal of Computer Applications, 2019, 39(7): 1925-1930.

参考文献

[1] OTT M, CHOI Y, CARDIE C, et al. Finding deceptive opinion spam by any stretch of the imagination[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2011:309-319.
[2] JINDAL N, LIU B. Opinion spam and analysis[C]//Proceedings of the 2008 International Conference on Web Search and Data Mining. New York:ACM, 2008:219-230.
[3] REN Y F, ZHANG Y. Deceptive opinion spam detection using neural network[C]//COLING 2016:Proceedings of the 26th International Conference on Computational Linguistics:Technical Papers. Osaka, Japan:COLING, 2016:140-150.
[4] YOO K H, GRETZEL U. Comparison of deceptive and truthful travel reviews[C]//Proceedings of the 2009 International Conference on Information and Communication Technologies. Berlin:Springer, 2009:37-47.
[5] FENG S, BANERJEE R, CHOI Y. Syntactic stylometry for deception detection[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics:Short Papers-Volume 2. Stroudsburg, PA:Association for Computational Linguistics, 2012:171-175.
[6] FENG V W, HIRST G. Detecting deceptive opinions with profile compatibility[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2013:338-346.
[7] MUKHERJEE A, VENKATARAMAN V, LIU B, et al. Fake review detection:classification and analysis of real and pseudo reviews[R]. Chicago:University of Illinois, Department of Computer Science, 2013:3.
[8] MUKHERJEE A, KUMAR A, LIU B, et al. Spotting opinion spammers using behavioral footprints[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2013:632-640.
[9] QIAN T Y, LIU B. Identifying multiple userids of the same author[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2013:1124-1135.
[10] 任亚峰,姬东鸿,尹兰.基于半监督学习算法的虚假评论识别研究[J].计算机科学与探索,2014,46(3):62-69.(REN Y F, JI D H, YIN L. Deceptive reviews detection based on semi-supervised learning algorithm[J]. Advanced Engineering Sciences, 2014, 46(3):62-69.)
[11] ROUT J K, SINGH S, JENA S K, et al. Deceptive review detection using labeled and unlabeled data[J]. Multimedia Tools and Applications, 2017, 76(3):1-25.
[12] REN Y F, JI D H, YIN L, et al. Finding deceptive opinion spam by correcting the mislabeled instances[J]. Chinese Journal of Electronics, 2015, 24(1):52-57.
[13] KIM S, CHANG H, LEE S, et al. Deep semantic frame-based deceptive opinion spam analysis[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. New York:ACM, 2015:1131-1140.
[14] WANG X P, LIU K, HE S Z, et al. Learning to represent review with tensor decomposition for spam detection[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2016:866-875.
[15] 任亚峰,尹兰,姬东鸿.基于语言结构和情感极性的虚假评论识别[J].计算机科学与探索,2014,8(3):313-320.(REN Y F, YIN L, JI D H. Deceptive reviews detection based on language structure and sentiment polarity[J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(3):313-320.)
[16] ZHANG W, DU Y H, YOSHIDA T, et al. DRI-RCNN:an approach to deceptive review identification using recurrent convolutional neural network[J]. Information Processing and Management, 2018, 54(4):576-592.
[17] NOEKHAH S, SALIM N B, ZAKARIA N H. A novel model for opinion spam detection based on multi-iteration network structure[J]. Advanced Science Letters, 2018, 24(2):1437-1442.
[18] REN Y F, ZHANG Y, ZHANG M S, et al. Context-sensitive twitter sentiment classification using neural network[C]//Proceedings of the 13th AAAI Conference on Artificial Intelligence. Menlo Park, CA:AAAI, 2016:215-221.
[19] REN Y F, ZHANG Y, ZHANG M S, et al. Improving twitter sentiment classification using topic-enriched multi-prototype word embeddings[C]//Proceedings of the 13th AAAI Conference on Artificial Intelligence. Menlo Park, CA:AAAI, 2016:3038-3044.
[20] YESSENALI A A, CARDIE C. Compositional matrix-space models for sentiment analysis[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2011:172-182.
[21] LE Q, MIKOLOV T. Distributed representations of sentences and documents[J]. Journal of Machine Learning Research, 2014, 32(2):1188-1196.
[22] SOCHER R, PERELYGIN A, WU J, et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2013:1631-1642.
[23] JOHNSON R, ZHANG T. Effective use of word order for text categorization with convolutional neural networks[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2015:103-112.
[24] LI J W, LUONG M T, JURAFSKY D, et al. When are tree structures necessary for deep learning of representations[EB/OL].[2017-08-04]. http://www.emnlp2015.org/proceedings/EMNLP/pdf/EMNLP278.pdf.
[25] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2017-10-20]. https://arxiv.org/abs/1409.0473.
[26] YANG Z C, YANG D Y, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg, PA:Association for Computational Linguistics, 2016:1480-1489.
[27] CHEN H M, SUN M S, TU C C, et al. Neural sentiment classification with user and product attention[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2016:1650-1659.
[28] MUKHERJEE A, VENKATARAMAN V, LIU B, et al. What yelp fake review filter might be doing[C]//Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media. Menlo Park, CA:AAAI, 2013:409-418.
[29] RAYANA S, AKOGLU L. Collective opinion spam detection:bridging review networks and metadata[C]//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2015:985-994.

基于层次注意力机制神经网络模型的虚假评论识别

Deceptive review detection via hierarchical neural network model with attention mechanism

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王贺兵, 张春梅. 基于非对称卷积-压缩激发-次代残差网络的人脸关键点检测[J]. 计算机应用, 2021, 41(9): 2741-2747.
[2]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[3]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[4]	刘子辰, 李小娟, 韦伟. 基于循环神经网络的专利价格自动评估[J]. 计算机应用, 2021, 41(9): 2532-2538.
[5]	宋中山, 梁家锐, 郑禄, 刘振宇, 帖军. 基于双向门控尺度特征融合的遥感场景分类[J]. 计算机应用, 2021, 41(9): 2726-2735.
[6]	李康康, 张静. 基于注意力机制的多层次编码和解码的图像描述模型[J]. 计算机应用, 2021, 41(9): 2504-2509.
[7]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[8]	赵宏, 孔东一. 图像特征注意力与自适应注意力融合的图像内容中文描述[J]. 计算机应用, 2021, 41(9): 2496-2503.
[9]	徐江浪, 李林燕, 万新军, 胡伏原. 结合目标检测的室内场景识别方法[J]. 计算机应用, 2021, 41(9): 2720-2725.
[10]	牟长宁, 王海鹏, 周丕宇, 侯鑫行. 基于图卷积神经网络的串联质谱从头测序[J]. 计算机应用, 2021, 41(9): 2773-2779.
[11]	曹玉红, 徐海, 刘荪傲, 王紫霄, 李宏亮. 基于深度学习的医学影像分割研究综述[J]. 计算机应用, 2021, 41(8): 2273-2287.
[12]	丁尹, 桑楠, 李晓瑜, 吴飞舟. 基于循环神经网络的电信行业容量数据预测方法[J]. 计算机应用, 2021, 41(8): 2373-2378.
[13]	秦斌斌, 彭良康, 卢向明, 钱江波. 司机分心驾驶检测研究进展[J]. 计算机应用, 2021, 41(8): 2330-2337.
[14]	黄程程, 董霄霄, 李钊. 基于二维Winograd算法的深流水线5×5卷积方法[J]. 计算机应用, 2021, 41(8): 2258-2264.
[15]	党伟超, 李涛, 白尚旺, 高改梅, 刘春霞. 基于自注意力长短期记忆网络的Web软件系统实时剩余寿命预测方法[J]. 计算机应用, 2021, 41(8): 2346-2351.