ContraStacker：一种极度不平衡欺诈检测的集成方法

doi:10.11772/j.issn.1001-9081.2025050692

《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (5): 1363-1369.DOI: 10.11772/j.issn.1001-9081.2025050692

• 人工智能 •

ContraStacker：一种极度不平衡欺诈检测的集成方法

李星灿, 丁立中(), 张君宇, 张春晖

北京理工大学计算机学院，北京 100081

收稿日期:2025-06-23 修回日期:2025-07-20 接受日期:2025-07-23 发布日期:2025-08-01 出版日期:2026-05-10
通讯作者: 丁立中
作者简介:李星灿（2002—），男，云南曲靖人，硕士研究生，主要研究方向：不平衡学习、AI原生应用
张君宇（1996—），男，湖北荆州人，博士研究生，主要研究方向：多表格学习、集成学习
张春晖（1998—），男，吉林四平人，博士研究生，主要研究方向：对比学习、金融数据挖掘。
基金资助:
国家重点研发计划项目(2022YFB2703100);国家自然科学基金联合基金项目(U22A2099);国家自然科学基金面上项目(62376028);国家自然科学基金优秀青年科学基金资助项目（海外）

ContraStacker： an ensemble approach for extremely imbalanced fraud detection

Xingcan LI, Lizhong DING(), Junyu ZHANG, Chunhui ZHANG

School of Computer Science and Technology，Beijing Institute of Technology，Beijing 100081，China

Received:2025-06-23 Revised:2025-07-20 Accepted:2025-07-23 Online:2025-08-01 Published:2026-05-10
Contact: Lizhong DING
About author:LI Xingcan， born in 2002， M. S. candidate. His research interests include imbalanced learning， AI native application.
ZHANG Junyu， born in 1996， Ph. D. candidate. His research interests include multi-table learning， ensemble learning.
ZHANG Chunhui， born in 1998， Ph. D. candidate. His research interests include contrastive learning， financial data mining.
Supported by:
National Key Research and Development Program of China(2022YFB2703100);Joint Funds of National Natural Science Foundation of China(U22A2099);General Project of National Natural Science Foundation of China(62376028);Excellent Young Scientists Fund （Overseas） of National Natural Science Foundation of China

摘要/Abstract

摘要：

机器学习依托数据建模与特征识别技术构建社会风险预测模型，赋能防控体系智能决策。然而，欺诈检测任务因正负样本数量严重不平衡导致传统方法效果受限，在极端不平衡的情况下，即便模型将所有交易预测为正常交易，准确率仍可高达99%以上，但欺诈交易的检出率却接近零；且单一模型仅能捕捉特定维度的欺诈特征，难以全面预测多种欺诈模式。因此，提出ContraStacker集成方法，突破数据不平衡限制，弥补单一模型的局限，精准识别多种欺诈模式，提升欺诈交易检出率。ContraStacker通过过采样、欠采样及其组合策略平衡数据分布，构建多风险预警器，并在Stacking框架中引入对比损失函数，深度融合模型预测结果与原始特征，增强模型泛化能力，成功应对极端不平衡的欺诈检测挑战。实验结果表明，ContraStacker在多个欺诈检测数据集上有效降低了误报率（FPR）（将正常交易预测为欺诈交易的比例），同时保持较低的漏检率（FNR）（将欺诈交易预测为正常交易的比例），在金融交易安全中具备应用潜力。

关键词: 欺诈检测, 集成学习, 不平衡数据, 对比损失, 风险预警器

Abstract:

Machine learning， relying on data modeling and feature recognition techniques， constructs social risk prediction models， enabling intelligent decision-making in risk prevention and control systems. However， fraud detection tasks are constrained by the severe imbalance between positive and negative samples. In cases of extreme imbalance， even if the model predicts all transactions as normal， the accuracy can still exceed 99%， while the detection rate of fraudulent transactions is close to zero. Moreover， a single model can only capture fraud features with specific dimensions and struggles to comprehensively predict multiple fraud patterns. To address this， a ContraStacker ensemble method was proposed to overcome data imbalance limitations， compensate for the shortcomings of a single model， and accurately identify various fraud patterns to improve fraud detection rate. ContraStacker balanced the data distribution through oversampling， undersampling， and their combined strategies， constructed multiple risk predictors， and integrated contrastive loss functions into the Stacking framework to deeply fuse model predictions and original features， enhancing the model's generalization ability， successfully tackling the challenge of extreme imbalance in fraud detection. Experimental results show that ContraStacker effectively reduces False Positive Rate （FPR）（the proportion of normal transactions predicted as fraudulent ones） while maintaining a low False Negative Rate （FNR）（the proportion of fraudulent transactions predicted as normal ones）， demonstrating its potential for application in financial transaction security.

Key words: fraud detection, ensemble learning, imbalanced data, contrastive loss, risk predictor

中图分类号:

TP391.7

李星灿, 丁立中, 张君宇, 张春晖. ContraStacker：一种极度不平衡欺诈检测的集成方法[J]. 计算机应用, 2026, 46(5): 1363-1369.

Xingcan LI, Lizhong DING, Junyu ZHANG, Chunhui ZHANG. ContraStacker： an ensemble approach for extremely imbalanced fraud detection[J]. Journal of Computer Applications, 2026, 46(5): 1363-1369.

图/表 7

参考文献 47

[1]	DIETTERICH T G. Ensemble methods in machine learning［C］// Proceedings of the First International Workshop on Multiple Classifier Systems， LNCS 1857. Berlin： Springer， 2000： 1-15.
[2]	PRUSTI D， RATH S K. Fraudulent transaction detection in credit card by applying ensemble machine learning techniques［C］// Proceedings of the 10th International Conference on Computing， Communication and Networking Technologies. Piscataway： IEEE， 2019： 1-6.
[3]	GUAN X， YANG B， CHEN C， et al. A comprehensive overview of cyber-physical systems： from perspective of feedback system［J］. IEEE/CAA Journal of Automatica Sinica， 2016， 3（1）： 1-14.
[4]	DAL P A， BORACCHI G， CAELEN O， et al. Credit card fraud detection： a realistic modeling and a novel learning strategy［J］. IEEE Transactions on Neural Networks and Learning Systems， 2018， 29（8）： 3784-3797.
[5]	Cybersource. 2024 Global ecommerce payments & fraud report［R/OL］. ［2024-12-17］..
[6]	蒋洪迅，江俊毅，梁循. 基于机器学习的信用卡交易欺诈检测研究综述［J］. 计算机工程与应用， 2023， 59（21）： 1-25.
	JIANG H X， JIANG J Y， LIANG X. Survey on credit card transaction fraud detection based on machine learning［J］. Computer Engineering and Applications， 2023， 59（21）： 1-25.
[7]	GIANINI G， GHEMMOGNE FOSSI L， MIO C， et al. Managing a pool of rules for credit card fraud detection by a Game Theory based approach［J］. Future Generation Computer Systems， 2020， 102： 549-561.
[8]	吴文龙，周喜，王轶，等. WKAG：一种针对不平衡医保数据的欺诈检测方法［J］. 计算机工程与应用， 2021， 57（9）： 247-254.
	WU W L， ZHOU X， WANG Y， et al. WKAG： fraud detection method for imbalanced medical insurance data［J］. Computer Engineering and Applications， 2021， 57（9）： 247-254.
[9]	田红鹏，韦甜. 模块化决策森林的区块链交易欺诈检测模型［J］. 计算机工程与应用， 2023， 59（19）： 237-246.
	TIAN H P， WEI T. Blockchain transaction fraud detection model based on modular decision forest［J］. Computer Engineering and Applications， 2023， 59（19）： 237-246.
[10]	LeCUN Y， BENGIO Y， HINTON G. Deep learning［J］. Nature， 2015， 521（7553）： 436-444.
[11]	MIENYE E， JERE N， OBAIDO G， et al. Deep learning in finance： a survey of applications and techniques［J］. AI， 2024， 5（4）： 2066-2091 .
[12]	ILEBERI E， SUN Y. A hybrid deep learning ensemble model for credit card fraud detection［J］. IEEE Access， 2024， 12： 175829-175838.
[13]	MENG Y， WANG X， WANG X， et al. Deepfake detection based on multi-scale RGB-Frequency feature fusion［C］// Proceedings of the 2nd International Conference on Intelligent Perception and Computer Vision. Piscataway： IEEE， 2024： 46-50.
[14]	BREIMAN L. Bagging predictors［J］. Machine Learning， 1996， 24（2）： 123-140.
[15]	SCHAPIRE R E. A brief introduction to boosting［C］// Proceedings of 16th International Joint Conference on Artificial Intelligence. San Francisco： Morgan Kaufmann Publishers Inc.， 1999： 1401-1406.
[16]	WOLPERT D H. Stacked generalization［J］. Neural Networks， 1992， 5（2）： 241-259.
[17]	PRISCILLA C V， PRABHA D P. Credit card fraud detection： a systematic review［C］// Proceedings of the 2019 International Conference on Innovative Computing and Cutting-edge Technologies， LAIS 9. Cham： Springer， 2020： 290-303.
[18]	ZHENG Z， CAI Y， LI Y. Oversampling method for imbalanced classification［J］. Computing and Informatics， 2015， 34（5）： 1017-1037.
[19]	CHAWLA N V， BOWYER K W， HALL L O， et al. SMOTE： synthetic minority over-sampling technique［J］. Journal of Artificial Intelligence Research， 2002， 16： 321-357.
[20]	BLAGUS R， LUSA L. SMOTE for high-dimensional class-imbalanced data［J］. BMC Bioinformatics， 2013， 14： No.106.
[21]	HE H， BAI Y， GARCIA E A， et al. ADASYN： adaptive synthetic sampling approach for imbalanced learning［C］// Proceedings of the 2008 IEEE International Joint Conference on Neural Networks （IEEE World Congress on Computational Intelligence）. Piscataway： IEEE， 2008： 1322-1328.
[22]	KOTSIANTIS S， KANELLOPOULOS D， PINTELAS P. Handling imbalanced datasets： a review［J］. GESTS International Transactions on Computer Science and Engineering， 2006， 30： 25-36.
[23]	SWANA E F， DOORSAMY W， BOKORO P. Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset［J］. Sensors， 2022， 22（9）： No.3246.
[24]	SIDDIQUE A， JAN A， MAJEED F， et al. Predicting academic performance using an efficient model based on fusion of classifiers［J］. Applied Sciences， 2021， 11（24）： No.11845.
[25]	WILSON D L. Asymptotic properties of nearest neighbor rules using edited data［J］. IEEE Transactions on Systems， Man， and Cybernetics， 1972， SMC-2（3）： 408-421.
[26]	MAKOWSKI P. Credit scoring branches out［J］. The Credit World， 1985， 75： 30-37.
[27]	SUN J， LI H， CHANG P C， et al. Dynamic credit scoring using B&B with incremental-SVM-ensemble［J］. Kybernetes， 2015， 44（4）： 518-535.
[28]	AZHAN M， MERAJ S. Credit card fraud detection using machine learning and deep learning techniques［C］// Proceedings of the 3rd International Conference on Intelligent Sustainable Systems. Piscataway： IEEE， 2020： 514-518.
[29]	WANG Z， KIM S， JOE I. An improved LSTM-based failure classification model for financial companies using natural language processing［J］. Applied Sciences， 2023， 13（13）： No.7884.
[30]	SONG A， SEO E， KIM H. Anomaly VAE-Transformer： a deep learning approach for anomaly detection in decentralized finance［J］. IEEE Access， 2023， 11： 98115-98131.
[31]	LIU Y， PAN S， WANG Y G， et al. Anomaly detection in dynamic graphs via Transformer［J］. IEEE Transactions on Knowledge and Data Engineering， 2021， 35（12）： 12081-12094.
[32]	LI Z， WANG H， ZHANG P， et al. Live-streaming fraud detection： a heterogeneous graph neural network approach［C］// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York： ACM， 2021： 3670-3678.
[33]	XIA Y， LIU C， DA B， et al. A novel heterogeneous ensemble credit scoring model based on the stacking approach［J］. Expert Systems with Applications， 2018， 93： 182-199.
[34]	TONG E N C， MUES C， THOMAS L. A zero-adjusted gamma model for mortgage loan loss given default［J］. International Journal of Forecasting， 2013， 29（4）： 548-562.
[35]	MA X， SHA J， WANG D， et al. Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGBoost algorithms according to different high-dimensional data cleaning［J］. Electronic Commerce Research and Applications， 2018， 31： 24-39.
[36]	兰景宏，黄江林. 基于主动学习和旋转森林算法的信用卡欺诈检测［J/OL］. 计算机应用与软件［2024-12-17］..
	LAN J H， HUANG J L. A credit card fraud detection method based on active learning and rotating forest algorithm［J/OL］. Computer Applications and Software ［2024-12-17］..
[37]	蔡青松，吴金迪，白宸宇. 基于可解释集成学习的信贷违约预测［J］.计算机系统应用， 2021， 30（12）： 194-201.
	CAI Q S， WU J D， BAI C Y. Prediction of credit default based on interpretable integration learning［J］. Computer Systems and Applications， 2021， 30（12）： 194-201.
[38]	JIJO B TB， ABDULAZEEZ A M. Classification based on decision tree algorithm for machine learning［J］. Journal of Applied Science and Technology Trends， 2021， 2（1）： 20-28.
[39]	SPERANDEI S. Understanding logistic regression analysis［J］. Biochemia Medica， 2014， 24（1）： 12-18.
[40]	PETERSON L E. K-nearest neighbor［J］. Scholarpedia， 2009， 4（2）： No.1883.
[41]	MEYER D， WIEN F. Support vector machines［J］. R News， 2001， 1（3）： 23-26.
[42]	TAUD H， MAS J F. Multilayer perceptron （MLP）［M］// CAMACHO OLMEDO M T， PAEGELOW M， MAS J F， et al. Geomatic approaches for modeling land change scenarios， LNGC. Cham： Springer， 2018： 451-455.
[43]	ARIK S Ö， PFISTER T. TabNet： attentive interpretable tabular learning［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 6679-6687.
[44]	CHENG H T， KOC L， HARMSEN J， et al. Wide & Deep learning for recommender systems［C］// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. New York： ACM， 2016： 7-10.
[45]	GUO H， TANG R， YE Y， et al. DeepFM： a factorization-machine based neural network for CTR prediction［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2017： 1725-1731.
[46]	LIAN J， ZHOU X， ZHANG F， et al. xDeepFM： Combining explicit and implicit feature interactions for recommender systems［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 1754-1763.
[47]	CHUANG C Y， ROBINSON J， LIN Y C， et al. Debiased contrastive learning［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 8765-8775.

真实情况	预测情况
真实情况	欺诈交易	正常交易
欺诈交易	正确（TP）	一类错误（FN）
正常交易	二类错误（FP）	正确（TN）

真实情况	预测情况
真实情况	欺诈交易	正常交易
欺诈交易	正确（TP）	一类错误（FN）
正常交易	二类错误（FP）	正确（TN）

数据集	TP	TN	FP	FN	I_FNR	I_FPR
Credit Card	97	55 102	2 761	3	0.03	0.031 0
IBM	190	192 579	10 348	19	0.09	0.051 0
PaySim	1 570	283 050	7 681	69	0.04	0.026 4
IEEE-CIS	3 300	73 371	40 605	834	0.20	0.356 2

数据集	TP	TN	FP	FN	I_FNR	I_FPR
Credit Card	97	55 102	2 761	3	0.03	0.031 0
IBM	190	192 579	10 348	19	0.09	0.051 0
PaySim	1 570	283 050	7 681	69	0.04	0.026 4
IEEE-CIS	3 300	73 371	40 605	834	0.20	0.356 2

数据集	TP	TN	FP	FN	I_FNR	I_FPR
Credit Card	97	55 794	1 069	3	0.03	0.018 8
IBM	190	192 915	10 012	19	0.09	0.043 1
PaySim	1 570	287 694	3 037	69	0.04	0.010 4
IEEE-CIS	3 300	78 525	35 451	834	0.20	0.311 0

ContraStacker：一种极度不平衡欺诈检测的集成方法

ContraStacker： an ensemble approach for extremely imbalanced fraud detection

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 47

相关文章 15

编辑推荐

Metrics

数据集	FNR/%	FPR/%
数据集	FNR/%	预警器	ContraStacker	降幅
Credit Card	3	3.10	1.88	39.4
IBM	9	5.10	4.31	15.5
PaySim	4	2.64	1.04	60.6
IEEE-CIS	20	35.62	31.10	12.7

[1]	秦传东, 索志强. 融合改进的ResNet50与集成分类器的皮肤癌分类[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1354-1362.
[2]	林金娇, 张灿舜, 陈淑娅, 王天鑫, 连剑, 徐庸辉. 基于改进图注意力网络的车险欺诈检测方法[J]. 《计算机应用》唯一官方网站, 2026, 46(2): 437-444.
[3]	赵彪, 秦玉华, 田荣坤, 胡月航, 陈芳锐. 依赖类型及距离增强的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2025, 45(8): 2507-2514.
[4]	谷铮, 陈学斌, 张宏扬, 李雨欣. 基于凝聚式层次聚类的微调筛选过采样方法[J]. 《计算机应用》唯一官方网站, 2025, 45(7): 2138-2144.
[5]	何玉林, 李旭, 贺颖婷, 崔来中, 黄哲学. 基于最大均值差异的子空间高斯混合模型聚类集成算法[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1712-1723.
[6]	李道全, 徐正, 陈思慧, 刘嘉宇. 融合变分自编码器与自适应增强卷积神经网络的网络流量分类模型[J]. 《计算机应用》唯一官方网站, 2025, 45(6): 1841-1848.
[7]	陈瑞龙, 胡涛, 卜佑军, 伊鹏, 胡先君, 乔伟. 面向加密恶意流量检测模型的堆叠集成对抗防御方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 864-871.
[8]	洪梓榕, 包广清. 基于集成学习的雷达自动目标识别综述[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 371-382.
[9]	冷强奎, 孙薛梓, 孟祥福. 基于样本势和噪声进化的不平衡数据过采样方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2466-2475.
[10]	杨帆, 邹窈, 朱明志, 马振伟, 程大伟, 蒋昌俊. 基于图注意力Transformer神经网络的信用卡欺诈检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2634-2642.
[11]	雷明珠, 王浩, 贾蓉, 白琳, 潘晓英. 基于特征间关系合成少数类样本的过采样算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1428-1436.
[12]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.
[13]	郭祥, 姜文刚, 王宇航. 基于改进Inception-ResNet的加密流量分类方法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2471-2476.
[14]	穆栋梁, 韩萌, 李昂, 刘淑娟, 高智慧. 概念漂移复杂数据流分类方法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1664-1675.
[15]	赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN：集成学习的新范式在企业自律性评价中的实践[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3107-3113.