Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (2): 360-368.DOI: 10.11772/j.issn.1001-9081.2023020219
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Xing WANG1,2(), Guijuan LIU1,2, Zhihao CHEN1,2
Received:
2023-03-03
Revised:
2023-05-22
Accepted:
2023-05-24
Online:
2023-08-14
Published:
2024-02-10
Contact:
Xing WANG
About author:
LIU Guijuan, born in 1986, M. S. candidate. Her research interests include natural language processing, deep learning.Supported by:
通讯作者:
王星
作者简介:
刘贵娟(1986—),女,山东菏泽人,硕士研究生,主要研究方向:自然语言处理、深度学习基金资助:
CLC Number:
Xing WANG, Guijuan LIU, Zhihao CHEN. Fake review detection algorithm combining Gaussian mixture model and text graph convolutional network[J]. Journal of Computer Applications, 2024, 44(2): 360-368.
王星, 刘贵娟, 陈志豪. 高斯混合模型与文本图卷积网络结合的虚假评论识别算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 360-368.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023020219
类型 | 虚假评论样例 |
---|---|
Ⅰ | 1.手机收到了,很喜欢,满意、建议大家购买 |
2.手机收到了,非常喜欢,颜色漂亮,物流特别快,非常满意的一次购物,还会继续购买,好评 | |
3.宝贝收到咯,真的很棒呀!非常满意!开心,值得购买!! | |
Ⅱ | 1.包装很精美,宝贝做工比较精细,卖家特别好,有问题都及时处理,确实是一次相当满意的网购,还会再来回购 |
2.这个价格买了不亏,质量很好,很满意。有问题客服也很快就解决了,态度非常好,一定好评!便宜又实惠,大品牌值得信赖 | |
3.包装好,物流快,价格实惠,非常满意的一次购物,物有所值,喜欢的宝宝赶紧下手,放心购买,客服态度也很OK,下次还会再来的!宝贝完全符合卖家的描述,还便宜! |
Tab.1 Examples of fake review sentence patterns
类型 | 虚假评论样例 |
---|---|
Ⅰ | 1.手机收到了,很喜欢,满意、建议大家购买 |
2.手机收到了,非常喜欢,颜色漂亮,物流特别快,非常满意的一次购物,还会继续购买,好评 | |
3.宝贝收到咯,真的很棒呀!非常满意!开心,值得购买!! | |
Ⅱ | 1.包装很精美,宝贝做工比较精细,卖家特别好,有问题都及时处理,确实是一次相当满意的网购,还会再来回购 |
2.这个价格买了不亏,质量很好,很满意。有问题客服也很快就解决了,态度非常好,一定好评!便宜又实惠,大品牌值得信赖 | |
3.包装好,物流快,价格实惠,非常满意的一次购物,物有所值,喜欢的宝宝赶紧下手,放心购买,客服态度也很OK,下次还会再来的!宝贝完全符合卖家的描述,还便宜! |
模型 | 虚假评论 | 正常评论 | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
BERT | 87.44 | 65.76 | 75.07 | 91.34 | 97.45 | 94.30 |
Text CNN | 78.48 | 70.54 | 74.30 | 92.27 | 94.78 | 93.51 |
Text GCN | 87.64 | 74.52 | 80.55 | 93.39 | 97.17 | 95.24 |
F-Text CNN | 78.98 | 70.58 | 74.54 | 92.29 | 94.93 | 93.59 |
F-Text GCN | 87.87 | 78.86 | 82.92 | 94.46 | 97.22 | 95.82 |
Tab.2 Comparison of review detection performance among different models
模型 | 虚假评论 | 正常评论 | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
BERT | 87.44 | 65.76 | 75.07 | 91.34 | 97.45 | 94.30 |
Text CNN | 78.48 | 70.54 | 74.30 | 92.27 | 94.78 | 93.51 |
Text GCN | 87.64 | 74.52 | 80.55 | 93.39 | 97.17 | 95.24 |
F-Text CNN | 78.98 | 70.58 | 74.54 | 92.29 | 94.93 | 93.59 |
F-Text GCN | 87.87 | 78.86 | 82.92 | 94.46 | 97.22 | 95.82 |
模型 | 虚假评论 | 正常评论 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |||||||
均值/% | 标准差 | 均值/% | 标准差 | 均值/% | 标准差 | 均值/% | 标准差 | 均值/% | 方差 | 均值/% | 标准差 | |
SVM | 82.69 | 16.48 | 27.48 | 82.62 | 99.14 | 90.13 | ||||||
Text GCN | 92.03 | 0.009 5 | 81.80 | 0.011 5 | 86.61 | 0.005 5 | 95.58 | 0.002 3 | 98.23 | 0.002 0 | 96.89 | 0.001 1 |
F-Text GCN | 94.71 | 0.010 8 | 82.38 | 0.010 0 | 88.11 | 0.005 3 | 95.74 | 0.002 0 | 98.85 | 0.002 8 | 97.27 | 0.001 2 |
Tab. 3 Comparison of experiment results between Text GCN and F-Text GCN with confusing data
模型 | 虚假评论 | 正常评论 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |||||||
均值/% | 标准差 | 均值/% | 标准差 | 均值/% | 标准差 | 均值/% | 标准差 | 均值/% | 方差 | 均值/% | 标准差 | |
SVM | 82.69 | 16.48 | 27.48 | 82.62 | 99.14 | 90.13 | ||||||
Text GCN | 92.03 | 0.009 5 | 81.80 | 0.011 5 | 86.61 | 0.005 5 | 95.58 | 0.002 3 | 98.23 | 0.002 0 | 96.89 | 0.001 1 |
F-Text GCN | 94.71 | 0.010 8 | 82.38 | 0.010 0 | 88.11 | 0.005 3 | 95.74 | 0.002 0 | 98.85 | 0.002 8 | 97.27 | 0.001 2 |
组合名称 | 引入连边关系表示 | 虚假评论 | 正常评论 | ||||
---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | ||
E11(会员) | 引入会员标签连边 | 87.00 | 77.75 | 82.09 | 94.17 | 96.86 | 95.49 |
E12(图片) | 引入图片标签连边 | 86.26 | 78.32 | 82.07 | 94.30 | 96.62 | 95.44 |
E13(视频) | 引入视频标签连边 | 85.82 | 78.73 | 82.12 | 94.39 | 96.49 | 95.43 |
E21(会员+图片) | 引入会员和图片标签连边 | 86.58 | 78.65 | 82.42 | 94.38 | 96.71 | 95.53 |
E22(图片+视频) | 引入会员图片和视频标签连边 | 87.49 | 77.50 | 82.19 | 94.12 | 97.01 | 95.54 |
E23(会员+视频) | 引入会员和视频标签连边 | 86.35 | 78.51 | 82.24 | 94.35 | 96.65 | 95.49 |
F-Text GCN | 引入会员、图片、视频标签连边 | 87.87 | 78.86 | 82.92 | 94.46 | 97.22 | 95.82 |
Tab. 4 Design and results of ablation experiments
组合名称 | 引入连边关系表示 | 虚假评论 | 正常评论 | ||||
---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | ||
E11(会员) | 引入会员标签连边 | 87.00 | 77.75 | 82.09 | 94.17 | 96.86 | 95.49 |
E12(图片) | 引入图片标签连边 | 86.26 | 78.32 | 82.07 | 94.30 | 96.62 | 95.44 |
E13(视频) | 引入视频标签连边 | 85.82 | 78.73 | 82.12 | 94.39 | 96.49 | 95.43 |
E21(会员+图片) | 引入会员和图片标签连边 | 86.58 | 78.65 | 82.42 | 94.38 | 96.71 | 95.53 |
E22(图片+视频) | 引入会员图片和视频标签连边 | 87.49 | 77.50 | 82.19 | 94.12 | 97.01 | 95.54 |
E23(会员+视频) | 引入会员和视频标签连边 | 86.35 | 78.51 | 82.24 | 94.35 | 96.65 | 95.49 |
F-Text GCN | 引入会员、图片、视频标签连边 | 87.87 | 78.86 | 82.92 | 94.46 | 97.22 | 95.82 |
1 | OTT M, CARDIE C, HANCOCK J. Estimating the prevalence of deception in online review communities[C]// Proceedings of the 21st International Conference on World Wide Web. New York:ACM, 2012: 201-210. 10.1145/2187836.2187864 |
2 | S-H CHUANG.Co-creating social media agility to build strong customer-firm relationships[J].Industrial Marketing Management, 2020, 84:202-211. 10.1016/j.indmarman.2019.06.012 |
3 | 吴佳芬, 马费成.产品虚假评论文本识别方法研究述评[J].数据分析与知识发现, 2019, 3(9): 1-15. |
WU J F, MA F C. Detecting product review spam: a survey[J]. Data Analysis and Knowledge Discovery, 2019, 3(9): 1-15. | |
4 | AHMED H, TRAORE I, SAAD S. Detecting opinion spams and fake news using text classification[J].Security and Privacy, 2018, 1(1): e9. 10.1002/spy2.9 |
5 | MIKOLOV T, GRAVE E, BOJANOWSKI P, et al. Advances in pre-training distributed word representations[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation. Paris: European Language Resources Association, 2018: 52-55. |
6 | DEVLIN J, CHANG M-W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsberg: ACL, 2019: 4171-4186. 10.18653/v1/n18-2 |
7 | SASTRAWAN I K, BAYUPATI I P A, ARSA. D M S. Detection of fake news using deep learning CNN-RNN based methods[J]. ICT Express, 2022, 8(3): 396-408. 10.1016/j.icte.2021.10.003 |
8 | 袁禄, 朱郑州, 任庭玉.虚假评论识别研究综述[J].计算机科学, 2021, 48(1):111-118. 10.11896/jsjkx.200500101 |
YUAN L, ZHU Z Z, REN T Y. Survey on fake review recolonization[J].Computer Science, 2021, 48(1): 111-118. 10.11896/jsjkx.200500101 | |
9 | 施运梅, 袁博, 张乐,等.IMTS:融合图像与文本语义的虚假评论检测方法[J].数据分析与知识发现, 2022, 6(8): 84-96. |
SHI Y M, YUAN B, ZHANG L, et al. IMTS: detecting fake reviews with image and text semantics[J]. Data Analysis and Knowledge Discovery, 2022, 6(8): 84-96. | |
10 | YAO L, MAO C, LUO Y. Graph convolutional networks for text classification[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 7370-7377. 10.1609/aaai.v33i01.33017370 |
11 | ZHANG D, ZHOU L, KEHOE J L, et al. What online reviewer behaviors really matter? effects of verbal and nonverbal behaviors on detection of fake online reviews[J]. Journal of Management Information Systems, 2016, 33(2): 456-481. 10.1080/07421222.2016.1205907 |
12 | MUNZEL A. Assisting consumers in detecting fake reviews: the role of identity information disclosure and consensus[J].Journal of Retailing and Consumer Services, 2016, 32: 96-108. 10.1016/j.jretconser.2016.06.002 |
13 | 檀莹莹, 王俊丽, 张超波.基于图卷积神经网络的文本分类方法研究综述[J].计算机科学,2022, 49(8): 205-216. 10.11896/jsjkx.210800064 |
TAN Y Y, WANG J L, ZHANG C B. Review of text classification methods based on graph convolutional network[J]. Computer Science, 2022, 49(8): 205-216. 10.11896/jsjkx.210800064 | |
14 | 徐冰冰, 岑科廷, 黄俊杰, 等.图卷积神经网络综述[J].计算机学报, 2020, 43(5): 755-780. 10.11897/SP.J.1016.2020.00755 |
XU B B, CEN K T, HUANG J J, et al. A survey on graph convolutional neural network[J]. Chinese Journal of Computers, 2020, 43(5): 755-780. 10.11897/SP.J.1016.2020.00755 | |
15 | MONTI F, FRASCA F, EYNARD D, et al. Fake news detection on social media using geometric deep learning[EB/OL].(2019-02-10)[2023-02-01]. . |
16 | LU Y-J, LI C-T. GCAN:Graph-aware co-attention networks for explainable fake news detection on social media[EB/OL].(2020-04-24)[2023-02-01]. . 10.18653/v1/2020.acl-main.48 |
17 | BIAN T, XIAO X, XU T, et al. Rumor detection on social media with bi-directional graph convolutional networks[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(1): 5393. 10.1609/aaai.v34i01.5393 |
18 | LI A, QIN Z, LIU R, et al. Spam review detection with graph convolutional networks[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 2703-2711. 10.1145/3357384.3357820 |
19 | LI C, GOLDWASSER D. Encoding social information with graph convolutional networks for political perspective detection in news media[C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsberg: ACL, 2019: 2594-2604. 10.18653/v1/p19-1247 |
20 | DONG M, ZHENG B, QUOC VIET HUNG N, et al. Multiple rumor source detection with graph convolutional networks[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York: ACM, 2019: 569-578. 10.1145/3357384.3357994 |
21 | SCARSELLI F, GORI M, TSOI A C, et al. The graph neural network model[J]. IEEE Transactions on Neural Networks, 2008, 20(1): 61-80. 10.1109/tnn.2008.2005605 |
22 | BATTAGLIA P, PASCANU R, LAI M, et al. Interaction networks for learning about objects,relations and physics[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2016: 4509-4517. 10.48550/arXiv.1612.00222 |
23 | DEFFERRARD M, BRESSON X, VANDERGHEYNST P. Convolutional neural networks on graphs with fast localized spectral filterings[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2016: 3844-3852. |
24 | 郑浩, 李源, 沈威, 等.结合注意力机制与图卷积网络的汉语复句关系识别[J].中文信息学报, 2022, 36(11):60-67. 10.3969/j.issn.1003-0077.2022.11.006 |
ZHENG H, LI Y, SHEN W, et al. Chinese complex sentence relation identification based on attention mechanism and graph convolutional network[J]. Journal of Chinese Information Processing, 2022, 36(11): 60-67. 10.3969/j.issn.1003-0077.2022.11.006 | |
25 | BASTINGS J, TITOV I, AZIZ W, et al. Graph convolutional encoders for syntax-aware neural machine translation[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Stroudsberg: ACL, 2017:1957-1967. 10.18653/v1/d17-1209 |
26 | JIANG B, ZHANG Z, LIN D, et al. Semi-supervised learning with graph learning-convolutional networks[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019:11313-11320. 10.1109/cvpr.2019.01157 |
27 | KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks. [EB/OL]. (2017-02-22)[2023-02-01]. . 10.48550/arXiv.1609.02907 |
28 | 陈可佳, 杨泽宇, 刘峥, 等. 基于邻域选择策略的图卷积网络模型[J].计算机应用, 2019, 39(12): 3415-3419. 10.11772/j.issn.1001-9081.2019071281 |
CHEN K J, YANG Z Y, LIU Z, et al. Graph convolutional network model using neighborhood selection strategy[J]. Journal of Computer Applications, 2019, 39(12): 3415-3419. 10.11772/j.issn.1001-9081.2019071281 | |
29 | VAN ZAANEN M, KANTERS P. Automatic mood classification using TF*IDF based on lyrics[C]// Proceedings of the 11th International Society for Music Information Retrieval Conference. [S.l.]: International Society for Music Information Retrieval, 2010: 75-80. |
30 | CHURCH K W, HANKS P. Word association norms, mutual information, and lexicography[J]. Computational Linguistics, 1990, 16(1): 22-29. 10.5555/89086.89095 |
31 | 王星.非参数统计[M].3版.北京:电子工业出版社, 2020: 126-130. |
WANG X. Nonparametric Statistics[M]. 3rd ed. Beijing: Publishing House of Electronics Industry, 2020:126-130. |
[1] | LIN Lang, WANG Rangding, YAN Diqun, LI Can. Playback speech detection algorithm based on modified cepstrum feature [J]. Journal of Computer Applications, 2018, 38(6): 1648-1652. |
[2] | YU Xinrong, LI Zhihua, YAN Chengyu, LI Shuangli. High efficient virtual machines consolidation method in cloud data center [J]. Journal of Computer Applications, 2018, 38(2): 550-556. |
[3] | TAO Zhiyong, LIU Xiaofang, WANG Hezhang. Clustering algorithm of Gaussian mixture model based on density peaks [J]. Journal of Computer Applications, 2018, 38(12): 3433-3437. |
[4] | CHEN Wenbing, GUAN Zhengxiong, CHEN Yunjie. Data augmentation method based on conditional generative adversarial net model [J]. Journal of Computer Applications, 2018, 38(11): 3305-3311. |
[5] | CHEN Yan, YAN Teng, SONG Junfang, SONG Huansheng. Night-time vehicle detection based on Gaussian mixture model and AdaBoost [J]. Journal of Computer Applications, 2018, 38(1): 260-263. |
[6] | HUANG Liang, PAN Ping, ZHOU Chao. Speaker authentication method based on quantum tunneling effect [J]. Journal of Computer Applications, 2017, 37(9): 2617-2620. |
[7] | LI Junshan, YANG Yawei, ZHU Zijiang, ZHANG Jiao. Image restoration based on natural patch likelihood and sparse prior [J]. Journal of Computer Applications, 2017, 37(8): 2319-2323. |
[8] | ZHANG Haiyan, GAO Shangbing. Application of improved spatially constrained Bayesian network model to image segmentation [J]. Journal of Computer Applications, 2017, 37(3): 823-826. |
[9] | PI Aidi, YU Jian, ZHOU Xiaobo. Learning-based performance monitoring and analysis for Spark in container environments [J]. Journal of Computer Applications, 2017, 37(12): 3586-3591. |
[10] | LI Xuejun, ZHANG Kaihua, SONG Huihui. Unsupervised video segmentation by fusing multiple spatio-temporal feature representations [J]. Journal of Computer Applications, 2017, 37(11): 3134-3138. |
[11] | JI Lina, CHEN Qingkui, CHEN Yuanjing, ZHAO Deyu, FANG Yuling, ZHAO Yongtao. Real-time crowd counting method from video stream based on GPU [J]. Journal of Computer Applications, 2017, 37(1): 145-152. |
[12] | JIANG Guilin, HU Fangyu, SHI Lixing. Urban functional area identification based on call detail record data [J]. Journal of Computer Applications, 2016, 36(7): 2046-2050. |
[13] | KONG Weiting, ZHAN Yongzhao. Video semantic detection based on topographic independent component analysis and Gaussian mixture model [J]. Journal of Computer Applications, 2016, 36(3): 770-773. |
[14] | HU Liying, GUO Gongde, MA Changfeng. Overlapping community discovery method based on symmetric nonnegative matrix factorization [J]. Journal of Computer Applications, 2015, 35(10): 2742-2746. |
[15] | WANG Yuwen HU Shunbo. Ensemble registration of medical images with Gaussian mixture model and color component regularization [J]. Journal of Computer Applications, 2014, 34(1): 154-157. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||