Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (2): 618-622.DOI: 10.11772/j.issn.1001-9081.2018061382

Previous Articles    

Credit card fraud classification based on GAN-AdaBoost-DT imbalanced classification algorithm

MO Zan1, GAI Yanrong1, FAN Guanlong2   

  1. 1. School of Management, Guangdong University of Technology, Guangzhou Guangdong 510000, China;
    2. Department of Computer Science, Hong Kong Baptist University, Hong Kong 999077, China
  • Received:2018-07-03 Revised:2018-08-21 Online:2019-02-10 Published:2019-02-15
  • Supported by:
    This work is partially supported by National Natural Science Foundation of China (711710), the "Twelfth Five-Year" National Science and Technology Support Program (2011BAD13B11), the Guangdong Provincial Regional Demonstration Project for Marine Economic Innovation and Development (GD2013-D01-001).

基于GAN-AdaBoost-DT不平衡分类算法的信用卡欺诈分类

莫赞1, 盖彦蓉1, 樊冠龙2   

  1. 1. 广东工业大学 管理学院, 广州 510520;
    2. 香港浸会大学 计算机系, 香港 999077
  • 通讯作者: 盖彦蓉
  • 作者简介:莫赞(1962-),男,广东广州人,教授,博士,主要研究方向:电子商务、管理信息系统;盖彦蓉(1994-),女,山东烟台人,硕士研究生,主要研究方向:机器学习、数据挖掘;樊冠龙(1994-),男,广东深圳人,硕士研究生,主要研究方向:机器学习、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目(711710);"十二五"国家科技支撑计划项目(2011BAD13B11);广东省海洋经济创新发展区域示范专项项目(GD2013-D01-001)。

Abstract: Concerning that traditional single classifiers have poor classification effect for imbalanced data classification, a new binary-class imbalanced data classification algorithm was proposed based on Generative Adversarial Nets (GAN) and ensemble learning, namely Generative Adversarial Nets-Adaptive Boosting-Decision Tree (GAN-AdaBoost-DT). Firstly, GAN training was adopted to get a generative model which produced minority class samples to reduce imbalance ratio. Then, the minority class samples were brought into Adaptive Boosting (AdaBoost) learning framework and their weights were changed to improve AdaBoost model and classification performance of AdaBoost with Decision Tree (DT) as base classifier. Area Under the Carve (AUC) was used to evaluate the performance of classifier when dealing with imbalanced classification problems. The experimental results on credit card fraud data set illustrate that compared with synthetic minority over-sampling ensemble learning method, the accuracy of the proposed algorithm was increased by 4.5%, the AUC of it was improved by 6.5%; compared with modified synthetic minority over-sampling ensemble learning method, the accuracy was increased by 4.9%, the AUC was improved by 5.9%; compared with random under-sampling ensemble learning method, the accuracy was increased by 4.5%, the AUC was improved by 5.4%. The experimental results on other data sets of UCI and KEEL illustrate that the proposed algorithm can improve the accuracy of imbalanced classification and the overall classifier performance.

Key words: Generative Adversarial Nets (GAN), ensemble learning, imbalanced classification, binary-class classification, Adaptive Boosting (AdaBoost), Decision Tree (DT), credit card fraud

摘要: 针对传统单个分类器在不平衡数据上分类效果有限的问题,基于对抗生成网络(GAN)和集成学习方法,提出一种新的针对二类不平衡数据集的分类方法——对抗生成网络-自适应增强-决策树(GAN-AdaBoost-DT)算法。首先,利用GAN训练得到生成模型,生成模型生成少数类样本,降低数据的不平衡性;其次,将生成的少数类样本代入自适应增强(AdaBoost)模型框架,更改权重,改进AdaBoost模型,提升以决策树(DT)为基分类器的AdaBoost模型的分类性能。使用受测者工作特征曲线下面积(AUC)作为分类评价指标,在信用卡诈骗数据集上的实验分析表明,该算法与合成少数类样本集成学习相比,准确率提高了4.5%,受测者工作特征曲线下面积提高了6.5%;对比改进的合成少数类样本集成学习,准确率提高了4.9%,AUC值提高了5.9%;对比随机欠采样集成学习,准确率提高了4.5%,受测者工作特征曲线下面积提高了5.4%。在UCI和KEEL的其他数据集上的实验结果表明,该算法在不平衡二分类问题上能提高总体的准确率,优化分类器性能。

关键词: 对抗生成网络, 集成学习, 不平衡分类, 二分类, 自适应增强, 决策树, 信用卡欺诈

CLC Number: