计算机应用 ›› 2019, Vol. 39 ›› Issue (4): 1214-1219.DOI: 10.11772/j.issn.1001-9081.2018091861

• 应用前沿、交叉与综合 • 上一篇    下一篇

基于密集子图的银行电信诈骗检测方法

刘枭, 王晓国   

  1. 同济大学 电子与信息工程学院, 上海 201800
  • 收稿日期:2018-09-07 修回日期:2018-11-08 发布日期:2019-04-10 出版日期:2019-04-10
  • 通讯作者: 刘枭
  • 作者简介:刘枭(1989-),男,江苏苏州人,博士研究生,主要研究方向:数据挖掘、欺诈检测;王晓国(1966-),男,河南商丘人,教授,博士,主要研究方向:数据挖掘、企业信息化。

Dense subgraph based telecommunication fraud detection approach in bank

LIU Xiao, WANG Xiaoguo   

  1. College of Electronics and Information Engineering, Tongji University, Shanghai 201800, China
  • Received:2018-09-07 Revised:2018-11-08 Online:2019-04-10 Published:2019-04-10

摘要: 目前银行对电信诈骗的标记数据积累少,人工标记数据的代价大,导致电信诈骗检测的有监督学习方法可使用的标记数据不足。针对这个问题,提出一种基于密集子图的无监督学习方法用于电信诈骗的检测。首先,通过在账户-资源(IP地址和MAC地址统称为资源)网络搜索可疑度较高的子图来识别欺诈账户;然后,设计了一种符合电信诈骗特性的子图可疑度量;最后,提出一种磁盘驻留、线性内存消耗且有理论保障的可疑子图搜索算法。在两组模拟数据集上,所提方法的F1-score分别达到0.921和0.861,高于CrossSpot、fBox和EvilCohort算法,与M-Zoom算法的0.899和0.898相近,但是所提方法的平均运行时间和内存消耗峰值均小于M-Zoom算法;在真实数据集上,所提方法的F1-score达到0.550,高于fBox和EvilCohort算法,与M-Zoom算法的0.529相近。实验结果表明,所提方法能较好地应用于现阶段的银行反电信诈骗业务,且非常适合于实际应用中的大规模数据集。

关键词: 电信诈骗, 无监督学习, 欺诈检测, 密集子图, 贪心算法

Abstract: Lack of labeled data accumulated for telecommunication fraud in the bank and high cost of manually labeling cause the insufficiency of labeled data that can be used in supervised learning methods for telecommunication fraud detection. To solve this problem, an unsupervised learning method based on dense subgraph was proposed to detect telecommunication fraud. Firstly, subgraphs with high anomaly degree in the network of accounts and resources (IP addresses and MAC addresses) were searched to identify fraud accounts. Then, a subgraph anomaly degree metric satisfying the features of telecommunication fraud was designed. Finally, a suspicious subgraph searching algorithm with resident disk, efficient memory and theory guarantee was proposed. On two synthetic datasets, the F1-scores of the proposed method are 0.921 and 0.861, which are higher than those of CrossSpot, fBox and EvilCohort algorithms while very close to those of M-Zoom algorithm (0.899 and 0.898), but the average running time and memory consumption peak of the proposed method are less than those of M-Zoom algorithm. On real-world dataset, F1-score of the proposed method is 0.550, which is higher than that of fBox and EvilCohort while very close to that of M-Zoom algorithm (0.529). Theoretical analysis and simulation results show that the proposed method can be applied to telecommunication fraud detection in the bank effectively, and is suitable for big datasets in practice.

Key words: telecommunication fraud, unsupervised learning, fraud detection, dense subgraph, greedy algorithm

中图分类号: