计算机应用 ›› 2019, Vol. 39 ›› Issue (6): 1595-1600.DOI: 10.11772/j.issn.1001-9081.2018122611

• 2018全国高性能计算学术年会(HPC China 2018)论文 • 上一篇    下一篇

基于带权评论图的水军群组检测及特征分析

张琪1, 纪淑娟1, 傅强2, 张纯金3   

  1. 1. 山东省智慧矿山信息技术重点实验室(山东科技大学), 山东 青岛 266590;
    2. 秦皇岛市公安局 技术侦察支队, 河北 秦皇岛 066000;
    3. 山东科技大学 网络信息中心, 山东 青岛 266590
  • 收稿日期:2018-12-12 修回日期:2019-03-25 出版日期:2019-06-10 发布日期:2019-06-17
  • 通讯作者: 纪淑娟
  • 作者简介:张琪(1995-),男,山东泰安人,硕士研究生,CCF会员,主要研究方向:人工智能;纪淑娟(1977-),女,河北唐山人,副教授,博士,CCF高级会员,主要研究方向:分布式智能、智能信息处理;傅强(1983-),男,河北唐山人,工程师,主要研究方向:智能信息处理、虚假信息检测;张纯金(1977-),男,山东菏泽人,工程师,硕士,主要研究方向:网络安全、智能信息处理。
  • 基金资助:
    国家自然科学基金资助项目(71772107,61502281)。

Weighted reviewer graph based spammer group detection and characteristic analysis

ZHANG Qi1, JI Shujuan1, FU Qiang2, ZHANG Chunjin3   

  1. 1. Shandong Key Laboratory of Wisdom Mine Information Technology(Shandong University of Science and Technology), Qingdao Shangdong 266590, China;
    2. Technical Reconnaissance Detachment, Qinhuangdao Public Security Bureau, Qinhuangdao Hebei 066100, China;
    3. Network Information Center, Shandong University of Science and Technology, Qingdao Shangdong 266590, China
  • Received:2018-12-12 Revised:2019-03-25 Online:2019-06-10 Published:2019-06-17
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71772107, 61502281).

摘要: 针对在电子商务平台上检测编写虚假评论的水军群组的问题,提出了基于带权评论图的水军群组检测算法(WGSA)。首先,利用共评论特征构建带权评论图,权重由一系列群组造假指标计算得到;然后,为边权重设置阈值筛选可疑子图;最后,从图的社区结构出发,利用社区发现算法生成最终的水军群组。在Yelp大型数据集上的实验结果表明,与K均值聚类算法(KMeans)、基于密度的噪声应用空间聚类算法(DBscan)以及层次聚类算法相比WGSA算法的准确度更高,同时对检测到水军群组的特征与差异作了分析,发现水军群组的活跃度不同,危害也不同。其中,高活跃度群组危害最大,应重点关注。

关键词: 电子商务, 水军群组, 带权评论图, 社区发现, 聚类

Abstract: Concerning the problem that how to detect spammer groups writing fake reviews on the e-commerce platforms, a Weighted reviewer Graph based Spammer group detection Algorithm (WGSA) was proposed. Firstly, a weighted reviewer graph was built based on the co-reviewing feature with the weight calculated by a series of group spam indicators. Then, a threshold was set for the edge weight to filter the suspicious subgraphs. Finally, considering the community structure of the graph, the community discovery algorithm was used to generate the spammer groups. Compared with K-Means clustering algorithm (KMeans), Density-Based spatial clustering of applications with noise (DBscan) and hierarchical clustering algorithm on the large dataset Yelp, the accuracy of WGSA is higher. The characteristics and distinction of the detected spammer groups were also analyzed, which show that spammer groups with different activeness have different harm. The high-active group is more harmful and should be concerned more.

Key words: e-commerce, spammer group, weighted reviewer graph, community discovery, clustering

中图分类号: