计算机应用 ›› 2021, Vol. 41 ›› Issue (5): 1275-1281.DOI: 10.11772/j.issn.1001-9081.2020081190

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

基于层次异构图注意力网络的虚假评论检测

张蓉, 张献国   

  1. 内蒙古大学 计算机学院, 呼和浩特 010000
  • 收稿日期:2020-08-10 修回日期:2020-08-25 出版日期:2021-05-10 发布日期:2020-11-05
  • 通讯作者: 张献国
  • 作者简介:张蓉(1996-),女,内蒙古呼和浩特人,硕士研究生,主要研究方向:数据挖掘;张献国(1973-),男,山西兴县人,副教授,硕士,CCF会员,主要研究方向:机器学习、数据挖掘、社会行为挖掘与分析。
  • 基金资助:
    国家自然科学基金地区科学基金资助项目(41761086)。

Opinion spam detection based on hierarchical heterogeneous graph attention network

ZHANG Rong, ZHANG Xianguo   

  1. College of Computer Science, Inner Mongolia University, Huhhot Inner Mongolia 010000, China
  • Received:2020-08-10 Revised:2020-08-25 Online:2021-05-10 Published:2020-11-05
  • Supported by:
    This work is partially supported by the Regional Program of National Natural Science Foundation of China (41761086).

摘要: 针对虚假评论检测中不能充分利用评论的非语义特征的问题,提出了一种新的基于层次注意力机制与异构图注意力网络的层次异构图注意力网络(HHGAN)模型。首先,通过层次注意力机制学习评论文本中词级别和句级别的文档表示,重点捕获对虚假评论检测有重要意义的单词和句子;然后,将学习到的文档表示作为节点,并选取评论中非语义特征作为元路径来构建具有双层注意力机制的异构图注意力网络;最后,设计一个多层感知器(MLP)用以判别评论类别。实验结果表明,HHGAN模型在yelp.com中提取的餐厅数据集和酒店数据集上的F1值分别到达0.942和0.923,效果明显优于传统的卷积神经网络(CNN)模型和其他神经网络基准模型。

关键词: 虚假评论检测, 表示学习, 图神经网络, 层次注意力机制, 异构图神经网络

Abstract: Aiming at the problem that the non-semantic features of reviews cannot be fully utilized in opinion spam detection, a hierarchical attention mechanism and heterogeneous graph attention network based model, Hierarchical Heterogeneous Graph Attention Network (HHGAN), was proposed. Firstly, the hierarchical attention mechanism was used to learn the word-level and sentence-level document representations to focus on the capturing of the words and sentences that were important to the opinion spam detection. Then, the learned document representations were used as nodes, and the non-semantic features in reviews were selected as meta-paths to construct a heterogeneous graph attention network with a double-layer attention mechanism. Finally, a Multi-Layer Perceptron (MLP) was designed to distinguish the categories of reviews. Experimental results on datasets of restaurant and hotel extracted from yelp.com show that the F1 values of the HHGAN model reach 0.942 and 0.923 respectively, which are better than those of the traditional Convolutional Neural Network (CNN) model and other benchmark models of neural network.

Key words: opinion spam detection, representation learning, Graph Neural Network (GNN), hierarchical attention mechanism, Heterogeneous Graph Neural Network (HGNN)

中图分类号: