《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (5): 1363-1369.DOI: 10.11772/j.issn.1001-9081.2025050692
• 人工智能 •
收稿日期:2025-06-23
修回日期:2025-07-20
接受日期:2025-07-23
发布日期:2025-08-01
出版日期:2026-05-10
通讯作者:
丁立中
作者简介:李星灿(2002—),男,云南曲靖人,硕士研究生,主要研究方向:不平衡学习、AI原生应用基金资助:
Xingcan LI, Lizhong DING(
), Junyu ZHANG, Chunhui ZHANG
Received:2025-06-23
Revised:2025-07-20
Accepted:2025-07-23
Online:2025-08-01
Published:2026-05-10
Contact:
Lizhong DING
About author:LI Xingcan, born in 2002, M. S. candidate. His research interests include imbalanced learning, AI native application.Supported by:摘要:
机器学习依托数据建模与特征识别技术构建社会风险预测模型,赋能防控体系智能决策。然而,欺诈检测任务因正负样本数量严重不平衡导致传统方法效果受限,在极端不平衡的情况下,即便模型将所有交易预测为正常交易,准确率仍可高达99%以上,但欺诈交易的检出率却接近零;且单一模型仅能捕捉特定维度的欺诈特征,难以全面预测多种欺诈模式。因此,提出ContraStacker集成方法,突破数据不平衡限制,弥补单一模型的局限,精准识别多种欺诈模式,提升欺诈交易检出率。ContraStacker通过过采样、欠采样及其组合策略平衡数据分布,构建多风险预警器,并在Stacking框架中引入对比损失函数,深度融合模型预测结果与原始特征,增强模型泛化能力,成功应对极端不平衡的欺诈检测挑战。实验结果表明,ContraStacker在多个欺诈检测数据集上有效降低了误报率(FPR)(将正常交易预测为欺诈交易的比例),同时保持较低的漏检率(FNR)(将欺诈交易预测为正常交易的比例),在金融交易安全中具备应用潜力。
中图分类号:
李星灿, 丁立中, 张君宇, 张春晖. ContraStacker:一种极度不平衡欺诈检测的集成方法[J]. 计算机应用, 2026, 46(5): 1363-1369.
Xingcan LI, Lizhong DING, Junyu ZHANG, Chunhui ZHANG. ContraStacker: an ensemble approach for extremely imbalanced fraud detection[J]. Journal of Computer Applications, 2026, 46(5): 1363-1369.
| 真实情况 | 预测情况 | |
|---|---|---|
| 欺诈交易 | 正常交易 | |
| 欺诈交易 | 正确(TP) | 一类错误(FN) |
| 正常交易 | 二类错误(FP) | 正确(TN) |
表1 混淆矩阵
Tab. 1 Confusion matrix
| 真实情况 | 预测情况 | |
|---|---|---|
| 欺诈交易 | 正常交易 | |
| 欺诈交易 | 正确(TP) | 一类错误(FN) |
| 正常交易 | 二类错误(FP) | 正确(TN) |
| 数据集 | TP | TN | FP | FN | IFNR | IFPR |
|---|---|---|---|---|---|---|
| Credit Card | 97 | 55 102 | 2 761 | 3 | 0.03 | 0.031 0 |
| IBM | 190 | 192 579 | 10 348 | 19 | 0.09 | 0.051 0 |
| PaySim | 1 570 | 283 050 | 7 681 | 69 | 0.04 | 0.026 4 |
| IEEE-CIS | 3 300 | 73 371 | 40 605 | 834 | 0.20 | 0.356 2 |
表2 最优预警器结果
Tab. 2 Results of best risk predictor
| 数据集 | TP | TN | FP | FN | IFNR | IFPR |
|---|---|---|---|---|---|---|
| Credit Card | 97 | 55 102 | 2 761 | 3 | 0.03 | 0.031 0 |
| IBM | 190 | 192 579 | 10 348 | 19 | 0.09 | 0.051 0 |
| PaySim | 1 570 | 283 050 | 7 681 | 69 | 0.04 | 0.026 4 |
| IEEE-CIS | 3 300 | 73 371 | 40 605 | 834 | 0.20 | 0.356 2 |
| 数据集 | TP | TN | FP | FN | IFNR | IFPR |
|---|---|---|---|---|---|---|
| Credit Card | 97 | 55 794 | 1 069 | 3 | 0.03 | 0.018 8 |
| IBM | 190 | 192 915 | 10 012 | 19 | 0.09 | 0.043 1 |
| PaySim | 1 570 | 287 694 | 3 037 | 69 | 0.04 | 0.010 4 |
| IEEE-CIS | 3 300 | 78 525 | 35 451 | 834 | 0.20 | 0.311 0 |
表3 ContraStacker实验结果
Tab. 3 Experimental results of ContraStacker
| 数据集 | TP | TN | FP | FN | IFNR | IFPR |
|---|---|---|---|---|---|---|
| Credit Card | 97 | 55 794 | 1 069 | 3 | 0.03 | 0.018 8 |
| IBM | 190 | 192 915 | 10 012 | 19 | 0.09 | 0.043 1 |
| PaySim | 1 570 | 287 694 | 3 037 | 69 | 0.04 | 0.010 4 |
| IEEE-CIS | 3 300 | 78 525 | 35 451 | 834 | 0.20 | 0.311 0 |
| 数据集 | FNR/% | FPR/% | ||
|---|---|---|---|---|
| 预警器 | ContraStacker | 降幅 | ||
| Credit Card | 3 | 3.10 | 1.88 | 39.4 |
| IBM | 9 | 5.10 | 4.31 | 15.5 |
| PaySim | 4 | 2.64 | 1.04 | 60.6 |
| IEEE-CIS | 20 | 35.62 | 31.10 | 12.7 |
表4 各数据集最优风险预警器与ContraStacker的对比
Tab. 4 Comparison between best risk predictor and ContraStacker on different datasets
| 数据集 | FNR/% | FPR/% | ||
|---|---|---|---|---|
| 预警器 | ContraStacker | 降幅 | ||
| Credit Card | 3 | 3.10 | 1.88 | 39.4 |
| IBM | 9 | 5.10 | 4.31 | 15.5 |
| PaySim | 4 | 2.64 | 1.04 | 60.6 |
| IEEE-CIS | 20 | 35.62 | 31.10 | 12.7 |
| [1] | DIETTERICH T G. Ensemble methods in machine learning[C]// Proceedings of the First International Workshop on Multiple Classifier Systems, LNCS 1857. Berlin: Springer, 2000: 1-15. |
| [2] | PRUSTI D, RATH S K. Fraudulent transaction detection in credit card by applying ensemble machine learning techniques[C]// Proceedings of the 10th International Conference on Computing, Communication and Networking Technologies. Piscataway: IEEE, 2019: 1-6. |
| [3] | GUAN X, YANG B, CHEN C, et al. A comprehensive overview of cyber-physical systems: from perspective of feedback system[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(1): 1-14. |
| [4] | DAL P A, BORACCHI G, CAELEN O, et al. Credit card fraud detection: a realistic modeling and a novel learning strategy[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(8): 3784-3797. |
| [5] | Cybersource. 2024 Global ecommerce payments & fraud report[R/OL]. [2024-12-17].. |
| [6] | 蒋洪迅,江俊毅,梁循. 基于机器学习的信用卡交易欺诈检测研究综述[J]. 计算机工程与应用, 2023, 59(21): 1-25. |
| JIANG H X, JIANG J Y, LIANG X. Survey on credit card transaction fraud detection based on machine learning[J]. Computer Engineering and Applications, 2023, 59(21): 1-25. | |
| [7] | GIANINI G, GHEMMOGNE FOSSI L, MIO C, et al. Managing a pool of rules for credit card fraud detection by a Game Theory based approach[J]. Future Generation Computer Systems, 2020, 102: 549-561. |
| [8] | 吴文龙,周喜,王轶,等. WKAG:一种针对不平衡医保数据的欺诈检测方法[J]. 计算机工程与应用, 2021, 57(9): 247-254. |
| WU W L, ZHOU X, WANG Y, et al. WKAG: fraud detection method for imbalanced medical insurance data[J]. Computer Engineering and Applications, 2021, 57(9): 247-254. | |
| [9] | 田红鹏,韦甜. 模块化决策森林的区块链交易欺诈检测模型[J]. 计算机工程与应用, 2023, 59(19): 237-246. |
| TIAN H P, WEI T. Blockchain transaction fraud detection model based on modular decision forest[J]. Computer Engineering and Applications, 2023, 59(19): 237-246. | |
| [10] | LeCUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444. |
| [11] | MIENYE E, JERE N, OBAIDO G, et al. Deep learning in finance: a survey of applications and techniques[J]. AI, 2024, 5(4): 2066-2091 . |
| [12] | ILEBERI E, SUN Y. A hybrid deep learning ensemble model for credit card fraud detection[J]. IEEE Access, 2024, 12: 175829-175838. |
| [13] | MENG Y, WANG X, WANG X, et al. Deepfake detection based on multi-scale RGB-Frequency feature fusion[C]// Proceedings of the 2nd International Conference on Intelligent Perception and Computer Vision. Piscataway: IEEE, 2024: 46-50. |
| [14] | BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140. |
| [15] | SCHAPIRE R E. A brief introduction to boosting[C]// Proceedings of 16th International Joint Conference on Artificial Intelligence. San Francisco: Morgan Kaufmann Publishers Inc., 1999: 1401-1406. |
| [16] | WOLPERT D H. Stacked generalization[J]. Neural Networks, 1992, 5(2): 241-259. |
| [17] | PRISCILLA C V, PRABHA D P. Credit card fraud detection: a systematic review[C]// Proceedings of the 2019 International Conference on Innovative Computing and Cutting-edge Technologies, LAIS 9. Cham: Springer, 2020: 290-303. |
| [18] | ZHENG Z, CAI Y, LI Y. Oversampling method for imbalanced classification[J]. Computing and Informatics, 2015, 34(5): 1017-1037. |
| [19] | CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357. |
| [20] | BLAGUS R, LUSA L. SMOTE for high-dimensional class-imbalanced data[J]. BMC Bioinformatics, 2013, 14: No.106. |
| [21] | HE H, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]// Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Piscataway: IEEE, 2008: 1322-1328. |
| [22] | KOTSIANTIS S, KANELLOPOULOS D, PINTELAS P. Handling imbalanced datasets: a review[J]. GESTS International Transactions on Computer Science and Engineering, 2006, 30: 25-36. |
| [23] | SWANA E F, DOORSAMY W, BOKORO P. Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset[J]. Sensors, 2022, 22(9): No.3246. |
| [24] | SIDDIQUE A, JAN A, MAJEED F, et al. Predicting academic performance using an efficient model based on fusion of classifiers[J]. Applied Sciences, 2021, 11(24): No.11845. |
| [25] | WILSON D L. Asymptotic properties of nearest neighbor rules using edited data[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1972, SMC-2(3): 408-421. |
| [26] | MAKOWSKI P. Credit scoring branches out[J]. The Credit World, 1985, 75: 30-37. |
| [27] | SUN J, LI H, CHANG P C, et al. Dynamic credit scoring using B&B with incremental-SVM-ensemble[J]. Kybernetes, 2015, 44(4): 518-535. |
| [28] | AZHAN M, MERAJ S. Credit card fraud detection using machine learning and deep learning techniques[C]// Proceedings of the 3rd International Conference on Intelligent Sustainable Systems. Piscataway: IEEE, 2020: 514-518. |
| [29] | WANG Z, KIM S, JOE I. An improved LSTM-based failure classification model for financial companies using natural language processing[J]. Applied Sciences, 2023, 13(13): No.7884. |
| [30] | SONG A, SEO E, KIM H. Anomaly VAE-Transformer: a deep learning approach for anomaly detection in decentralized finance[J]. IEEE Access, 2023, 11: 98115-98131. |
| [31] | LIU Y, PAN S, WANG Y G, et al. Anomaly detection in dynamic graphs via Transformer[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 35(12): 12081-12094. |
| [32] | LI Z, WANG H, ZHANG P, et al. Live-streaming fraud detection: a heterogeneous graph neural network approach[C]// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York: ACM, 2021: 3670-3678. |
| [33] | XIA Y, LIU C, DA B, et al. A novel heterogeneous ensemble credit scoring model based on the stacking approach[J]. Expert Systems with Applications, 2018, 93: 182-199. |
| [34] | TONG E N C, MUES C, THOMAS L. A zero-adjusted gamma model for mortgage loan loss given default[J]. International Journal of Forecasting, 2013, 29(4): 548-562. |
| [35] | MA X, SHA J, WANG D, et al. Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGBoost algorithms according to different high-dimensional data cleaning[J]. Electronic Commerce Research and Applications, 2018, 31: 24-39. |
| [36] | 兰景宏,黄江林. 基于主动学习和旋转森林算法的信用卡欺诈检测[J/OL]. 计算机应用与软件 [2024-12-17].. |
| LAN J H, HUANG J L. A credit card fraud detection method based on active learning and rotating forest algorithm[J/OL]. Computer Applications and Software [2024-12-17].. | |
| [37] | 蔡青松,吴金迪,白宸宇. 基于可解释集成学习的信贷违约预测[J].计算机系统应用, 2021, 30(12): 194-201. |
| CAI Q S, WU J D, BAI C Y. Prediction of credit default based on interpretable integration learning[J]. Computer Systems and Applications, 2021, 30(12): 194-201. | |
| [38] | JIJO B TB, ABDULAZEEZ A M. Classification based on decision tree algorithm for machine learning[J]. Journal of Applied Science and Technology Trends, 2021, 2(1): 20-28. |
| [39] | SPERANDEI S. Understanding logistic regression analysis[J]. Biochemia Medica, 2014, 24(1): 12-18. |
| [40] | PETERSON L E. K-nearest neighbor[J]. Scholarpedia, 2009, 4(2): No.1883. |
| [41] | MEYER D, WIEN F. Support vector machines[J]. R News, 2001, 1(3): 23-26. |
| [42] | TAUD H, MAS J F. Multilayer perceptron (MLP)[M]// CAMACHO OLMEDO M T, PAEGELOW M, MAS J F, et al. Geomatic approaches for modeling land change scenarios, LNGC. Cham: Springer, 2018: 451-455. |
| [43] | ARIK S Ö, PFISTER T. TabNet: attentive interpretable tabular learning[C]// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2021: 6679-6687. |
| [44] | CHENG H T, KOC L, HARMSEN J, et al. Wide & Deep learning for recommender systems[C]// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. New York: ACM, 2016: 7-10. |
| [45] | GUO H, TANG R, YE Y, et al. DeepFM: a factorization-machine based neural network for CTR prediction[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California: ijcai.org, 2017: 1725-1731. |
| [46] | LIAN J, ZHOU X, ZHANG F, et al. xDeepFM: Combining explicit and implicit feature interactions for recommender systems[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2018: 1754-1763. |
| [47] | CHUANG C Y, ROBINSON J, LIN Y C, et al. Debiased contrastive learning[C]// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2020: 8765-8775. |
| [1] | 秦传东, 索志强. 融合改进的ResNet50与集成分类器的皮肤癌分类[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1354-1362. |
| [2] | 林金娇, 张灿舜, 陈淑娅, 王天鑫, 连剑, 徐庸辉. 基于改进图注意力网络的车险欺诈检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(2): 437-444. |
| [3] | 赵彪, 秦玉华, 田荣坤, 胡月航, 陈芳锐. 依赖类型及距离增强的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2507-2514. |
| [4] | 谷铮, 陈学斌, 张宏扬, 李雨欣. 基于凝聚式层次聚类的微调筛选过采样方法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2138-2144. |
| [5] | 何玉林, 李旭, 贺颖婷, 崔来中, 黄哲学. 基于最大均值差异的子空间高斯混合模型聚类集成算法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1712-1723. |
| [6] | 李道全, 徐正, 陈思慧, 刘嘉宇. 融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1841-1848. |
| [7] | 陈瑞龙, 胡涛, 卜佑军, 伊鹏, 胡先君, 乔伟. 面向加密恶意流量检测模型的堆叠集成对抗防御方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 864-871. |
| [8] | 洪梓榕, 包广清. 基于集成学习的雷达自动目标识别综述[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 371-382. |
| [9] | 冷强奎, 孙薛梓, 孟祥福. 基于样本势和噪声进化的不平衡数据过采样方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2466-2475. |
| [10] | 杨帆, 邹窈, 朱明志, 马振伟, 程大伟, 蒋昌俊. 基于图注意力Transformer神经网络的信用卡欺诈检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2634-2642. |
| [11] | 雷明珠, 王浩, 贾蓉, 白琳, 潘晓英. 基于特征间关系合成少数类样本的过采样算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1428-1436. |
| [12] | 龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310. |
| [13] | 郭祥, 姜文刚, 王宇航. 基于改进Inception-ResNet的加密流量分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2471-2476. |
| [14] | 穆栋梁, 韩萌, 李昂, 刘淑娟, 高智慧. 概念漂移复杂数据流分类方法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1664-1675. |
| [15] | 赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN:集成学习的新范式在企业自律性评价中的实践[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3107-3113. |
| 阅读次数 | ||||||
|
全文 |
|
|||||
|
摘要 |
|
|||||