基于层次异构图注意力网络的虚假评论检测

doi:10.11772/j.issn.1001-9081.2020081190

计算机应用 ›› 2021, Vol. 41 ›› Issue (5): 1275-1281.DOI: 10.11772/j.issn.1001-9081.2020081190

所属专题：人工智能

基于层次异构图注意力网络的虚假评论检测

张蓉, 张献国

内蒙古大学计算机学院, 呼和浩特 010000

收稿日期:2020-08-10 修回日期:2020-08-25 出版日期:2021-05-10 发布日期:2020-11-05
通讯作者: 张献国
作者简介:张蓉(1996-),女,内蒙古呼和浩特人,硕士研究生,主要研究方向:数据挖掘;张献国(1973-),男,山西兴县人,副教授,硕士,CCF会员,主要研究方向:机器学习、数据挖掘、社会行为挖掘与分析。
基金资助:
国家自然科学基金地区科学基金资助项目（41761086）。

Opinion spam detection based on hierarchical heterogeneous graph attention network

ZHANG Rong, ZHANG Xianguo

College of Computer Science, Inner Mongolia University, Huhhot Inner Mongolia 010000, China

Received:2020-08-10 Revised:2020-08-25 Online:2021-05-10 Published:2020-11-05
Supported by:
This work is partially supported by the Regional Program of National Natural Science Foundation of China (41761086).

摘要/Abstract

摘要： 针对虚假评论检测中不能充分利用评论的非语义特征的问题，提出了一种新的基于层次注意力机制与异构图注意力网络的层次异构图注意力网络（HHGAN）模型。首先，通过层次注意力机制学习评论文本中词级别和句级别的文档表示，重点捕获对虚假评论检测有重要意义的单词和句子；然后，将学习到的文档表示作为节点，并选取评论中非语义特征作为元路径来构建具有双层注意力机制的异构图注意力网络；最后，设计一个多层感知器（MLP）用以判别评论类别。实验结果表明，HHGAN模型在yelp.com中提取的餐厅数据集和酒店数据集上的F1值分别到达0.942和0.923，效果明显优于传统的卷积神经网络（CNN）模型和其他神经网络基准模型。

关键词: 虚假评论检测, 表示学习, 图神经网络, 层次注意力机制, 异构图神经网络

Abstract: Aiming at the problem that the non-semantic features of reviews cannot be fully utilized in opinion spam detection, a hierarchical attention mechanism and heterogeneous graph attention network based model, Hierarchical Heterogeneous Graph Attention Network (HHGAN), was proposed. Firstly, the hierarchical attention mechanism was used to learn the word-level and sentence-level document representations to focus on the capturing of the words and sentences that were important to the opinion spam detection. Then, the learned document representations were used as nodes, and the non-semantic features in reviews were selected as meta-paths to construct a heterogeneous graph attention network with a double-layer attention mechanism. Finally, a Multi-Layer Perceptron (MLP) was designed to distinguish the categories of reviews. Experimental results on datasets of restaurant and hotel extracted from yelp.com show that the F1 values of the HHGAN model reach 0.942 and 0.923 respectively, which are better than those of the traditional Convolutional Neural Network (CNN) model and other benchmark models of neural network.

Key words: opinion spam detection, representation learning, Graph Neural Network (GNN), hierarchical attention mechanism, Heterogeneous Graph Neural Network (HGNN)

中图分类号:

TP391.1

张蓉, 张献国. 基于层次异构图注意力网络的虚假评论检测[J]. 计算机应用, 2021, 41(5): 1275-1281.

ZHANG Rong, ZHANG Xianguo. Opinion spam detection based on hierarchical heterogeneous graph attention network[J]. Journal of Computer Applications, 2021, 41(5): 1275-1281.

参考文献

[1] CUI G,LUI H K,GUO X. The effect of online consumer reviews on new product sales[J]. International Journal of Electronic Commerce,2012,17(1):39-58.
[2] OTT M,CHOI Y,CARDIE C,et al. Finding deceptive opinion spam by any stretch of the imagination[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg,PA:Association for Computational Linguistics,2011:309-319.
[3] MUKHERJEE A,VENKATARAMAN V,LIU B,et al. Fake review detection:classification and analysis of real pseudo review, UIC-CS-03-2013[R]. Chicago:University of Illinois,2013:3.
[4] KO M C,HUANG H H,CHEN H H. Paid review and paid writer detection[C]//Proceedings of the 2017 International Conference on Web Intelligence. New York:ACM,2017:637-645.
[5] ZHANG D,ZHOU L,KEHOE J L,et al. What online reviewer behaviors really matter? Effects of verbal and nonverbal behaviors on detection of fake online reviews[J]. Journal of Management Information Systems,2016,33(2):456-481.
[6] NTOULAS A,NAJORK M,MANASSE M,et al. Detecting spam Web pages through content analysis[C]//Proceedings of the 15th International Conference on World Wide Web. New York:ACM, 2006:83-92.
[7] METAXAS P T,DESTEFANO J. Web spam,propaganda and trust[C/OL]//Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web.[2020-03-02]. http://airweb.cse.lehigh.edu/2005/metaxas.pdf.
[8] CASTILLO C, DONATO D, GIONIS A, et al. Know your neighbors:web spam detection using the web topology[C]//Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM,2007:423-430.
[9] WU F,SHU J,HUANG Y,et al. Co-detecting social spammers and spam messages in microblogging via exploiting social contexts[J]. Neurocomputing,2016,201:51-65.
[10] JINDAL N,LIU B. Analyzing and detecting review spam[C]//Proceedings of the 7th IEEE International Conference on Data Mining. Piscataway:IEEE,2007:547-552.
[11] YOO K H,GRETZEL U. Comparison of deceptive and truthful travel reviews[M]//HÖPKEN W, GRETZEL U, LAW R. Information and Communication Technologies in Tourism 2009. Vienna:Springer,2009:37-47.
[12] OTT M,CARDIE C,HANCOCK J T. Estimating the prevalence of deception in online review communities[C]//Proceedings of the 21st International Conference on World Wide Web. New York:ACM,2012:201-210.
[13] OTT M,CARDIE C,HANCOCK J T. Negative deceptive opinion spam[C]//Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technology. Stroudsburg, PA:Association for Computational Linguistics,2013:497-501.
[14] JINDAL N,LIU B. Opinion spam and analysis[C]//Proceedings of the 2008 International Conference on Web Search and Data Mining. New York:ACM,2008:219-230.
[15] MUKHERJEE A,KUMAR A,LIU B,et al. Spotting opinion spammers using behavioral footprints[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM,2013:632-640.
[16] WANG G,XIE S,LIU B,et al. Review graph based online store review spammer detection[C]//Proceedings of the IEEE 11th International Conference on Data Mining. Piscataway:IEEE, 2011:1242-1247.
[17] MUKHERJEE A,VENKATARAMAN V,LIU B,et al. What yelp fake review filter might be doing?[C]//Proceedings of the 7th International AAAI Conference on Web and Social Media. Palo Alto,CA:AAAI Press,2013:409-418.
[18] KALCHBRENNER N,GREFENSTETTE E,BLUNSOM P. A convolutional neural network for modelling sentences[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2014:655-665.
[19] LI J, LUONG M T, JURAFSKY D, et al. When are tree structures necessary for deep learning of representations?[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2015:2304-2314.
[20] HUANG E H,SOCHER R,MANNING C D,et al. Improving word representations via global context and multiple word prototypes[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2012:873-882.
[21] PENNINGTON J,SOCHER R,MANNING C D. GloVe:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2014:1532-1543.
[22] SOCHER R,PERELYGIN A,WU J Y,et al. Recursive deep models for semantic compositionality over a sentiment treebank[C]//Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2013:1631-1642.
[23] HERMANN K M,BLUNSOM P. The role of syntax in vector space models of compositional semantics[C]//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics, 2013:894-904.
[24] LI L,QIN B,REN W,et al. Document representation and feature combination for deceptive spam review detection[J]. Neurocomputing,2017,254:33-41.
[25] TANG D, QIN B, LIU T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg,PA:Association for Computational Linguistics,2015:1422-1432.
[26] SHANG J,QU M,LIU J,et al. Meta-path guided embedding for similarity search in large-scale heterogeneous information networks[EB/OL].[2020-02-02]. https://arxiv.org/pdf/1610.09769.pdf.
[27] SHI C,HU B,ZHAO W X,et al. Heterogeneous information network embedding for recommendation[J]. IEEE Transactions on Knowledge and Data Engineering,2019,31(2):357-370.
[28] FU T Y,LEE W C,LEI Z. HIN2Vec:explore meta paths in heterogeneous information networks for representation learning[C]//Proceedings of the 26th ACM International Conference on Information and Knowledge Management. New York:ACM, 2017:1797-1806.
[29] SHI Y,ZHU Q,GUO F,et al. Easing embedding learning by comprehensive transcription of heterogeneous information networks[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM,2018:2190-2199.
[30] SUN L,HE L,HUANG Z,et al. Joint embedding of meta-path and meta-graph for heterogeneous information networks[C]//Proceedings of the 2018 IEEE International Conference on Big Knowledge. Piscataway:IEEE,2018:131-138.
[31] WANG X,JI H,SHI C,et al. Heterogeneous graph attention network[C]//Proceedings of the 2019 World Wide Web Conference. New York:ACM,2019:2022-2032.
[32] SUN Y,HAN J,YAN X,et al. PathSim:meta path-based top-k similarity search in heterogeneous information networks[J]. Proceedings of the VLDB Endowment,2011,4(11):992-1003.
[33] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:6000-6010.
[34] STOPPELMAN J. Why yelp has a review filter[EB/OL].[2020-02-02]. https://blog.yelp.com/2009/10/why-yelp-has-a-reviewfilter.

基于层次异构图注意力网络的虚假评论检测

Opinion spam detection based on hierarchical heterogeneous graph attention network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	王梓森, 梁英, 刘政君, 谢小杰, 张伟, 史红周. 科研项目同行评议专家学术专长匹配方法[J]. 计算机应用, 2021, 41(8): 2418-2426.
[2]	刘欢, 李晓戈, 胡立坤, 胡飞雄, 王鹏华. 基于知识图谱驱动的图神经网络推荐模型[J]. 计算机应用, 2021, 41(7): 1865-1870.
[3]	张萌, 李维华. 用户互动表示下的影响力最大化算法[J]. 计算机应用, 2021, 41(7): 1964-1969.
[4]	樊玮, 王慧敏, 邢艳. 基于自编码器的多视图属性网络表示学习模型[J]. 计算机应用, 2021, 41(4): 1064-1070.
[5]	南宁, 杨程屹, 武志昊. 基于多图神经网络的会话感知推荐模型[J]. 计算机应用, 2021, 41(2): 330-336.
[6]	郭景峰, 董慧, 张庭玮, 陈晓. 主题关注网络的表示学习[J]. 计算机应用, 2020, 40(2): 441-447.
[7]	何昊晨, 张丹红. 基于多维社交关系嵌入的深层图神经网络推荐方法[J]. 计算机应用, 2020, 40(10): 2795-2803.
[8]	郝志峰, 柯妍蓉, 李烁, 蔡瑞初, 温雯, 王丽娟. 基于图编码网络的社交网络节点分类方法[J]. 计算机应用, 2020, 40(1): 188-195.
[9]	张钊, 吉建民, 陈小平. 用于知识表示学习的对抗式负样本生成[J]. 计算机应用, 2019, 39(9): 2489-2493.
[10]	王文涛, 吴淋涛, 黄烨, 朱容波. 基于密集连接卷积神经网络的链路预测模型[J]. 计算机应用, 2019, 39(6): 1632-1638.
[11]	刘正铭, 马宏, 刘树新, 李海涛, 常圣. 融合节点描述属性信息的网络表示学习算法[J]. 计算机应用, 2019, 39(4): 1012-1020.
[12]	王文涛, 黄烨, 吴淋涛, 柯璇, 唐菀. 基于改进随机游走的网络表示学习算法[J]. 计算机应用, 2019, 39(3): 651-655.
[13]	陈婉杰, 盛益强. 基于网络表示学习的非单一维度的社区发现算法[J]. 计算机应用, 2019, 39(12): 3467-3475.
[14]	刘雨心, 王莉, 张昊. 基于分层注意力机制的神经网络垃圾评论检测模型[J]. 计算机应用, 2018, 38(11): 3063-3068.
[15]	刘思, 刘海, 陈启买, 贺超波. 基于网络表示学习与随机游走的链路预测算法[J]. 计算机应用, 2017, 37(8): 2234-2239.