Fake review detection algorithm combining Gaussian mixture model and text graph convolutional network

doi:10.11772/j.issn.1001-9081.2023020219

Abstract

Abstract:

For insufficient edge weight window threshold design in Text Graph Convolutional Network （Text GCN）， to mine the word association structure more accurately and improve prediction accuracy， a fake review detection algorithm combining Gaussian Mixture Model （GMM） and Text GCN named F-Text GCN was proposed. The edge signal strength of fake reviews that are relatively weak compared to normal reviews in training data size was improved by using GMM nature to separate noise edge weight distributions. Additionally， considering the diversity of information sources， the adjacency matrix was constructed by combing documents， words， reviews and non-text features. Finally， the fake review association structure of the adjacency matrix was extracted through spectral decomposition of Text GCN. Validation experiments were performed on 126 086 actual Chinese reviews collected by a large domestic e-commerce platform. Experimental results show that， for detecting fake reviews， the F1 value of F-Text GCN is 82.92%， outperforming BERT （Bidirectional Encoder Representation from Transformers） and Text CNN by 10.46% and 11.60%， respectively， the F1 of F-Text GCN is 2.94% higher than that of Text GCN. For highly imitated fake reviews which are challenging to detect， F-Text GCN achieves the overall prediction accuracy of 94.71% by secondary detection on the samples that Support Vector Machine （SVM） was difficult to detect， which is 2.91% and 14.54% higher than those of Text GCN and SVM. Based on study findings， lexical interference in consumer decision-making is evident in fake reviews’ second-order graph neighbor structure. This result indicates that the proposed algorithm is especially suitable for extracting long-range word collocation structures and global sentence feature pattern variations for fake reviews detection.

Key words: Gaussian Mixture Model (GMM), fake review detection, Text Graph Convolutional Network (Text GCN), adjacency matrix, co-occurrence word network

摘要：

针对文本图卷积网络（Text GCN）窗口边权阈值策略不足的问题，为了更精准地挖掘相关的词关联结构、提高预测精度，提出一种高斯混合模型（GMM）与Text GCN结合的虚假评论识别算法F-Text GCN。首先，利用GMM分离噪声边权分布的特性，提高虚假评论在训练数据上相对正常评论数不足的边信号强度；然后，考虑到信源的多样性，综合文档、词汇和评论以及非文本特征构造邻接矩阵；最后，通过Text GCN的谱分解提取邻接矩阵的虚假评论关联结构实施预测。根据国内某大型电商平台采集的126 086条实际中文评论数据开展实证研究，实验结果表明，F-Text GCN识别虚假评论的F1值达到82.92%，与预训练表征模型BERT和文本卷积神经网络相比分别提升了10.46%和11.60%，相较于只使用评论文本信源的Text GCN模型F1值提升了2.94%；研究了高仿虚假评论的预测错误率，在支持向量机（SVM）作用后难识别的评论样本上尝试二次识别，F-Text GCN整体预测准确率可达94.71%，相较于Text GCN和SVM，在识别准确率上分别提升了2.91%和14.54%。研究发现，虚假评论的二阶图邻居结构显示出较强的干预消费者决策的词汇，这表明所提算法特别适用于提取用于虚假评论检测的长程词语搭配结构和全局句子特征模式变化的场景。

关键词: 高斯混合模型, 虚假评论识别, 文本图卷积神经网络, 邻接矩阵, 词汇共现网络

CLC Number:

TP391

Xing WANG, Guijuan LIU, Zhihao CHEN. Fake review detection algorithm combining Gaussian mixture model and text graph convolutional network[J]. Journal of Computer Applications, 2024, 44(2): 360-368.

王星, 刘贵娟, 陈志豪. 高斯混合模型与文本图卷积网络结合的虚假评论识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 360-368.

Figures/Tables 15

References 31

1	OTT M， CARDIE C， HANCOCK J. Estimating the prevalence of deception in online review communities［C］// Proceedings of the 21st International Conference on World Wide Web. New York：ACM， 2012： 201-210. 10.1145/2187836.2187864
2	S-H CHUANG.Co-creating social media agility to build strong customer-firm relationships［J］.Industrial Marketing Management， 2020， 84：202-211. 10.1016/j.indmarman.2019.06.012
3	吴佳芬，马费成.产品虚假评论文本识别方法研究述评［J］.数据分析与知识发现， 2019， 3（9）： 1-15.
	WU J F， MA F C. Detecting product review spam： a survey［J］. Data Analysis and Knowledge Discovery， 2019， 3（9）： 1-15.
4	AHMED H， TRAORE I， SAAD S. Detecting opinion spams and fake news using text classification［J］.Security and Privacy， 2018， 1（1）： e9. 10.1002/spy2.9
5	MIKOLOV T， GRAVE E， BOJANOWSKI P， et al. Advances in pre-training distributed word representations［C］// Proceedings of the 11th International Conference on Language Resources and Evaluation. Paris： European Language Resources Association， 2018： 52-55.
6	DEVLIN J， CHANG M-W， LEE K， et al. BERT： Pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. Stroudsberg： ACL， 2019： 4171-4186. 10.18653/v1/n18-2
7	SASTRAWAN I K， BAYUPATI I P A， ARSA. D M S. Detection of fake news using deep learning CNN-RNN based methods［J］. ICT Express， 2022， 8（3）： 396-408. 10.1016/j.icte.2021.10.003
8	袁禄，朱郑州，任庭玉.虚假评论识别研究综述［J］.计算机科学， 2021， 48（1）：111-118. 10.11896/jsjkx.200500101
	YUAN L， ZHU Z Z， REN T Y. Survey on fake review recolonization［J］.Computer Science， 2021， 48（1）： 111-118. 10.11896/jsjkx.200500101
9	施运梅，袁博，张乐，等.IMTS：融合图像与文本语义的虚假评论检测方法［J］.数据分析与知识发现， 2022， 6（8）： 84-96.
	SHI Y M， YUAN B， ZHANG L， et al. IMTS： detecting fake reviews with image and text semantics［J］. Data Analysis and Knowledge Discovery， 2022， 6（8）： 84-96.
10	YAO L， MAO C， LUO Y. Graph convolutional networks for text classification［C］// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2019： 7370-7377. 10.1609/aaai.v33i01.33017370
11	ZHANG D， ZHOU L， KEHOE J L， et al. What online reviewer behaviors really matter？ effects of verbal and nonverbal behaviors on detection of fake online reviews［J］. Journal of Management Information Systems， 2016， 33（2）： 456-481. 10.1080/07421222.2016.1205907
12	MUNZEL A. Assisting consumers in detecting fake reviews： the role of identity information disclosure and consensus［J］.Journal of Retailing and Consumer Services， 2016， 32： 96-108. 10.1016/j.jretconser.2016.06.002
13	檀莹莹，王俊丽，张超波.基于图卷积神经网络的文本分类方法研究综述［J］.计算机科学，2022， 49（8）： 205-216. 10.11896/jsjkx.210800064
	TAN Y Y， WANG J L， ZHANG C B. Review of text classification methods based on graph convolutional network［J］. Computer Science， 2022， 49（8）： 205-216. 10.11896/jsjkx.210800064
14	徐冰冰，岑科廷，黄俊杰，等.图卷积神经网络综述［J］.计算机学报， 2020， 43（5）： 755-780. 10.11897/SP.J.1016.2020.00755
	XU B B， CEN K T， HUANG J J， et al. A survey on graph convolutional neural network［J］. Chinese Journal of Computers， 2020， 43（5）： 755-780. 10.11897/SP.J.1016.2020.00755
15	MONTI F， FRASCA F， EYNARD D， et al. Fake news detection on social media using geometric deep learning［EB/OL］.（2019-02-10）［2023-02-01］. .
16	LU Y-J， LI C-T. GCAN：Graph-aware co-attention networks for explainable fake news detection on social media［EB/OL］.（2020-04-24）［2023-02-01］. . 10.18653/v1/2020.acl-main.48
17	BIAN T， XIAO X， XU T， et al. Rumor detection on social media with bi-directional graph convolutional networks［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020， 34（1）： 5393. 10.1609/aaai.v34i01.5393
18	LI A， QIN Z， LIU R， et al. Spam review detection with graph convolutional networks［C］// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York： ACM， 2019： 2703-2711. 10.1145/3357384.3357820
19	LI C， GOLDWASSER D. Encoding social information with graph convolutional networks for political perspective detection in news media［C］// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsberg： ACL， 2019： 2594-2604. 10.18653/v1/p19-1247
20	DONG M， ZHENG B， QUOC VIET HUNG N， et al. Multiple rumor source detection with graph convolutional networks［C］// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York： ACM， 2019： 569-578. 10.1145/3357384.3357994
21	SCARSELLI F， GORI M， TSOI A C， et al. The graph neural network model［J］. IEEE Transactions on Neural Networks， 2008， 20（1）： 61-80. 10.1109/tnn.2008.2005605
22	BATTAGLIA P， PASCANU R， LAI M， et al. Interaction networks for learning about objects，relations and physics［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 4509-4517. 10.48550/arXiv.1612.00222
23	DEFFERRARD M， BRESSON X， VANDERGHEYNST P. Convolutional neural networks on graphs with fast localized spectral filterings［C］// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2016： 3844-3852.
24	郑浩，李源，沈威，等.结合注意力机制与图卷积网络的汉语复句关系识别［J］.中文信息学报， 2022， 36（11）：60-67. 10.3969/j.issn.1003-0077.2022.11.006
	ZHENG H， LI Y， SHEN W， et al. Chinese complex sentence relation identification based on attention mechanism and graph convolutional network［J］. Journal of Chinese Information Processing， 2022， 36（11）： 60-67. 10.3969/j.issn.1003-0077.2022.11.006
25	BASTINGS J， TITOV I， AZIZ W， et al. Graph convolutional encoders for syntax-aware neural machine translation［C］// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsberg： ACL， 2017：1957-1967. 10.18653/v1/d17-1209
26	JIANG B， ZHANG Z， LIN D， et al. Semi-supervised learning with graph learning-convolutional networks［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019：11313-11320. 10.1109/cvpr.2019.01157
27	KIPF T N， WELLING M. Semi-supervised classification with graph convolutional networks. ［EB/OL］. （2017-02-22）［2023-02-01］. . 10.48550/arXiv.1609.02907
28	陈可佳，杨泽宇，刘峥，等. 基于邻域选择策略的图卷积网络模型［J］.计算机应用， 2019， 39（12）： 3415-3419. 10.11772/j.issn.1001-9081.2019071281
	CHEN K J， YANG Z Y， LIU Z， et al. Graph convolutional network model using neighborhood selection strategy［J］. Journal of Computer Applications， 2019， 39（12）： 3415-3419. 10.11772/j.issn.1001-9081.2019071281
29	VAN ZAANEN M， KANTERS P. Automatic mood classification using TF*IDF based on lyrics［C］// Proceedings of the 11th International Society for Music Information Retrieval Conference. ［S.l.］： International Society for Music Information Retrieval， 2010： 75-80.
30	CHURCH K W， HANKS P. Word association norms， mutual information， and lexicography［J］. Computational Linguistics， 1990， 16（1）： 22-29. 10.5555/89086.89095
31	王星.非参数统计［M］.3版.北京：电子工业出版社， 2020： 126-130.
	WANG X. Nonparametric Statistics［M］. 3rd ed. Beijing： Publishing House of Electronics Industry， 2020：126-130.

类型	虚假评论样例
Ⅰ	1.手机收到了，很喜欢，满意、建议大家购买
	2.手机收到了，非常喜欢，颜色漂亮，物流特别快，非常满意的一次购物，还会继续购买，好评
	3.宝贝收到咯，真的很棒呀！非常满意！开心，值得购买！！
Ⅱ	1.包装很精美，宝贝做工比较精细，卖家特别好，有问题都及时处理，确实是一次相当满意的网购，还会再来回购
	2.这个价格买了不亏，质量很好，很满意。有问题客服也很快就解决了，态度非常好，一定好评！便宜又实惠，大品牌值得信赖
	3.包装好，物流快，价格实惠，非常满意的一次购物，物有所值，喜欢的宝宝赶紧下手，放心购买，客服态度也很OK，下次还会再来的！宝贝完全符合卖家的描述，还便宜！

类型	虚假评论样例
Ⅰ	1.手机收到了，很喜欢，满意、建议大家购买
	2.手机收到了，非常喜欢，颜色漂亮，物流特别快，非常满意的一次购物，还会继续购买，好评
	3.宝贝收到咯，真的很棒呀！非常满意！开心，值得购买！！
Ⅱ	1.包装很精美，宝贝做工比较精细，卖家特别好，有问题都及时处理，确实是一次相当满意的网购，还会再来回购
	2.这个价格买了不亏，质量很好，很满意。有问题客服也很快就解决了，态度非常好，一定好评！便宜又实惠，大品牌值得信赖
	3.包装好，物流快，价格实惠，非常满意的一次购物，物有所值，喜欢的宝宝赶紧下手，放心购买，客服态度也很OK，下次还会再来的！宝贝完全符合卖家的描述，还便宜！

模型	虚假评论			正常评论
模型	P	R	F1	P	R	F1
BERT	87.44	65.76	75.07	91.34	97.45	94.30
Text CNN	78.48	70.54	74.30	92.27	94.78	93.51
Text GCN	87.64	74.52	80.55	93.39	97.17	95.24
F-Text CNN	78.98	70.58	74.54	92.29	94.93	93.59
F-Text GCN	87.87	78.86	82.92	94.46	97.22	95.82

模型	虚假评论			正常评论
模型	P	R	F1	P	R	F1
BERT	87.44	65.76	75.07	91.34	97.45	94.30
Text CNN	78.48	70.54	74.30	92.27	94.78	93.51
Text GCN	87.64	74.52	80.55	93.39	97.17	95.24
F-Text CNN	78.98	70.58	74.54	92.29	94.93	93.59
F-Text GCN	87.87	78.86	82.92	94.46	97.22	95.82

模型	虚假评论						正常评论
	P		R		F1		P		R		F1
	均值/%	标准差	均值/%	标准差	均值/%	标准差	均值/%	标准差	均值/%	方差	均值/%	标准差
SVM	82.69		16.48		27.48		82.62		99.14		90.13
Text GCN	92.03	0.009 5	81.80	0.011 5	86.61	0.005 5	95.58	0.002 3	98.23	0.002 0	96.89	0.001 1
F-Text GCN	94.71	0.010 8	82.38	0.010 0	88.11	0.005 3	95.74	0.002 0	98.85	0.002 8	97.27	0.001 2