ContraStacker： an ensemble approach for extremely imbalanced fraud detection

doi:10.11772/j.issn.1001-9081.2025050692

Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1363-1369.DOI: 10.11772/j.issn.1001-9081.2025050692

• Artificial intelligence •

ContraStacker： an ensemble approach for extremely imbalanced fraud detection

Xingcan LI, Lizhong DING(), Junyu ZHANG, Chunhui ZHANG

School of Computer Science and Technology，Beijing Institute of Technology，Beijing 100081，China

Received:2025-06-23 Revised:2025-07-20 Accepted:2025-07-23 Online:2025-08-01 Published:2026-05-10
Contact: Lizhong DING
About author:LI Xingcan， born in 2002， M. S. candidate. His research interests include imbalanced learning， AI native application.
ZHANG Junyu， born in 1996， Ph. D. candidate. His research interests include multi-table learning， ensemble learning.
ZHANG Chunhui， born in 1998， Ph. D. candidate. His research interests include contrastive learning， financial data mining.
Supported by:
National Key Research and Development Program of China(2022YFB2703100);Joint Funds of National Natural Science Foundation of China(U22A2099);General Project of National Natural Science Foundation of China(62376028);Excellent Young Scientists Fund （Overseas） of National Natural Science Foundation of China

ContraStacker：一种极度不平衡欺诈检测的集成方法

李星灿, 丁立中(), 张君宇, 张春晖

北京理工大学计算机学院，北京 100081

通讯作者: 丁立中
作者简介:李星灿（2002—），男，云南曲靖人，硕士研究生，主要研究方向：不平衡学习、AI原生应用
张君宇（1996—），男，湖北荆州人，博士研究生，主要研究方向：多表格学习、集成学习
张春晖（1998—），男，吉林四平人，博士研究生，主要研究方向：对比学习、金融数据挖掘。
基金资助:
国家重点研发计划项目(2022YFB2703100);国家自然科学基金联合基金项目(U22A2099);国家自然科学基金面上项目(62376028);国家自然科学基金优秀青年科学基金资助项目（海外）

Abstract

Abstract:

Machine learning， relying on data modeling and feature recognition techniques， constructs social risk prediction models， enabling intelligent decision-making in risk prevention and control systems. However， fraud detection tasks are constrained by the severe imbalance between positive and negative samples. In cases of extreme imbalance， even if the model predicts all transactions as normal， the accuracy can still exceed 99%， while the detection rate of fraudulent transactions is close to zero. Moreover， a single model can only capture fraud features with specific dimensions and struggles to comprehensively predict multiple fraud patterns. To address this， a ContraStacker ensemble method was proposed to overcome data imbalance limitations， compensate for the shortcomings of a single model， and accurately identify various fraud patterns to improve fraud detection rate. ContraStacker balanced the data distribution through oversampling， undersampling， and their combined strategies， constructed multiple risk predictors， and integrated contrastive loss functions into the Stacking framework to deeply fuse model predictions and original features， enhancing the model's generalization ability， successfully tackling the challenge of extreme imbalance in fraud detection. Experimental results show that ContraStacker effectively reduces False Positive Rate （FPR）（the proportion of normal transactions predicted as fraudulent ones） while maintaining a low False Negative Rate （FNR）（the proportion of fraudulent transactions predicted as normal ones）， demonstrating its potential for application in financial transaction security.

Key words: fraud detection, ensemble learning, imbalanced data, contrastive loss, risk predictor

摘要：

机器学习依托数据建模与特征识别技术构建社会风险预测模型，赋能防控体系智能决策。然而，欺诈检测任务因正负样本数量严重不平衡导致传统方法效果受限，在极端不平衡的情况下，即便模型将所有交易预测为正常交易，准确率仍可高达99%以上，但欺诈交易的检出率却接近零；且单一模型仅能捕捉特定维度的欺诈特征，难以全面预测多种欺诈模式。因此，提出ContraStacker集成方法，突破数据不平衡限制，弥补单一模型的局限，精准识别多种欺诈模式，提升欺诈交易检出率。ContraStacker通过过采样、欠采样及其组合策略平衡数据分布，构建多风险预警器，并在Stacking框架中引入对比损失函数，深度融合模型预测结果与原始特征，增强模型泛化能力，成功应对极端不平衡的欺诈检测挑战。实验结果表明，ContraStacker在多个欺诈检测数据集上有效降低了误报率（FPR）（将正常交易预测为欺诈交易的比例），同时保持较低的漏检率（FNR）（将欺诈交易预测为正常交易的比例），在金融交易安全中具备应用潜力。

关键词: 欺诈检测, 集成学习, 不平衡数据, 对比损失, 风险预警器

CLC Number:

TP391.7

Xingcan LI, Lizhong DING, Junyu ZHANG, Chunhui ZHANG. ContraStacker： an ensemble approach for extremely imbalanced fraud detection[J]. Journal of Computer Applications, 2026, 46(5): 1363-1369.

李星灿, 丁立中, 张君宇, 张春晖. ContraStacker：一种极度不平衡欺诈检测的集成方法[J]. 《计算机应用》唯一官方网站, 2026, 46(5): 1363-1369.

Figures/Tables 7

References 47

[1]	DIETTERICH T G. Ensemble methods in machine learning［C］// Proceedings of the First International Workshop on Multiple Classifier Systems， LNCS 1857. Berlin： Springer， 2000： 1-15.
[2]	PRUSTI D， RATH S K. Fraudulent transaction detection in credit card by applying ensemble machine learning techniques［C］// Proceedings of the 10th International Conference on Computing， Communication and Networking Technologies. Piscataway： IEEE， 2019： 1-6.
[3]	GUAN X， YANG B， CHEN C， et al. A comprehensive overview of cyber-physical systems： from perspective of feedback system［J］. IEEE/CAA Journal of Automatica Sinica， 2016， 3（1）： 1-14.
[4]	DAL P A， BORACCHI G， CAELEN O， et al. Credit card fraud detection： a realistic modeling and a novel learning strategy［J］. IEEE Transactions on Neural Networks and Learning Systems， 2018， 29（8）： 3784-3797.
[5]	Cybersource. 2024 Global ecommerce payments & fraud report［R/OL］. ［2024-12-17］..
[6]	蒋洪迅，江俊毅，梁循. 基于机器学习的信用卡交易欺诈检测研究综述［J］. 计算机工程与应用， 2023， 59（21）： 1-25.
	JIANG H X， JIANG J Y， LIANG X. Survey on credit card transaction fraud detection based on machine learning［J］. Computer Engineering and Applications， 2023， 59（21）： 1-25.
[7]	GIANINI G， GHEMMOGNE FOSSI L， MIO C， et al. Managing a pool of rules for credit card fraud detection by a Game Theory based approach［J］. Future Generation Computer Systems， 2020， 102： 549-561.
[8]	吴文龙，周喜，王轶，等. WKAG：一种针对不平衡医保数据的欺诈检测方法［J］. 计算机工程与应用， 2021， 57（9）： 247-254.
	WU W L， ZHOU X， WANG Y， et al. WKAG： fraud detection method for imbalanced medical insurance data［J］. Computer Engineering and Applications， 2021， 57（9）： 247-254.
[9]	田红鹏，韦甜. 模块化决策森林的区块链交易欺诈检测模型［J］. 计算机工程与应用， 2023， 59（19）： 237-246.
	TIAN H P， WEI T. Blockchain transaction fraud detection model based on modular decision forest［J］. Computer Engineering and Applications， 2023， 59（19）： 237-246.
[10]	LeCUN Y， BENGIO Y， HINTON G. Deep learning［J］. Nature， 2015， 521（7553）： 436-444.
[11]	MIENYE E， JERE N， OBAIDO G， et al. Deep learning in finance： a survey of applications and techniques［J］. AI， 2024， 5（4）： 2066-2091 .
[12]	ILEBERI E， SUN Y. A hybrid deep learning ensemble model for credit card fraud detection［J］. IEEE Access， 2024， 12： 175829-175838.
[13]	MENG Y， WANG X， WANG X， et al. Deepfake detection based on multi-scale RGB-Frequency feature fusion［C］// Proceedings of the 2nd International Conference on Intelligent Perception and Computer Vision. Piscataway： IEEE， 2024： 46-50.
[14]	BREIMAN L. Bagging predictors［J］. Machine Learning， 1996， 24（2）： 123-140.
[15]	SCHAPIRE R E. A brief introduction to boosting［C］// Proceedings of 16th International Joint Conference on Artificial Intelligence. San Francisco： Morgan Kaufmann Publishers Inc.， 1999： 1401-1406.
[16]	WOLPERT D H. Stacked generalization［J］. Neural Networks， 1992， 5（2）： 241-259.
[17]	PRISCILLA C V， PRABHA D P. Credit card fraud detection： a systematic review［C］// Proceedings of the 2019 International Conference on Innovative Computing and Cutting-edge Technologies， LAIS 9. Cham： Springer， 2020： 290-303.
[18]	ZHENG Z， CAI Y， LI Y. Oversampling method for imbalanced classification［J］. Computing and Informatics， 2015， 34（5）： 1017-1037.
[19]	CHAWLA N V， BOWYER K W， HALL L O， et al. SMOTE： synthetic minority over-sampling technique［J］. Journal of Artificial Intelligence Research， 2002， 16： 321-357.
[20]	BLAGUS R， LUSA L. SMOTE for high-dimensional class-imbalanced data［J］. BMC Bioinformatics， 2013， 14： No.106.
[21]	HE H， BAI Y， GARCIA E A， et al. ADASYN： adaptive synthetic sampling approach for imbalanced learning［C］// Proceedings of the 2008 IEEE International Joint Conference on Neural Networks （IEEE World Congress on Computational Intelligence）. Piscataway： IEEE， 2008： 1322-1328.
[22]	KOTSIANTIS S， KANELLOPOULOS D， PINTELAS P. Handling imbalanced datasets： a review［J］. GESTS International Transactions on Computer Science and Engineering， 2006， 30： 25-36.
[23]	SWANA E F， DOORSAMY W， BOKORO P. Tomek link and SMOTE approaches for machine fault classification with an imbalanced dataset［J］. Sensors， 2022， 22（9）： No.3246.
[24]	SIDDIQUE A， JAN A， MAJEED F， et al. Predicting academic performance using an efficient model based on fusion of classifiers［J］. Applied Sciences， 2021， 11（24）： No.11845.
[25]	WILSON D L. Asymptotic properties of nearest neighbor rules using edited data［J］. IEEE Transactions on Systems， Man， and Cybernetics， 1972， SMC-2（3）： 408-421.
[26]	MAKOWSKI P. Credit scoring branches out［J］. The Credit World， 1985， 75： 30-37.
[27]	SUN J， LI H， CHANG P C， et al. Dynamic credit scoring using B&B with incremental-SVM-ensemble［J］. Kybernetes， 2015， 44（4）： 518-535.
[28]	AZHAN M， MERAJ S. Credit card fraud detection using machine learning and deep learning techniques［C］// Proceedings of the 3rd International Conference on Intelligent Sustainable Systems. Piscataway： IEEE， 2020： 514-518.
[29]	WANG Z， KIM S， JOE I. An improved LSTM-based failure classification model for financial companies using natural language processing［J］. Applied Sciences， 2023， 13（13）： No.7884.
[30]	SONG A， SEO E， KIM H. Anomaly VAE-Transformer： a deep learning approach for anomaly detection in decentralized finance［J］. IEEE Access， 2023， 11： 98115-98131.
[31]	LIU Y， PAN S， WANG Y G， et al. Anomaly detection in dynamic graphs via Transformer［J］. IEEE Transactions on Knowledge and Data Engineering， 2021， 35（12）： 12081-12094.
[32]	LI Z， WANG H， ZHANG P， et al. Live-streaming fraud detection： a heterogeneous graph neural network approach［C］// Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York： ACM， 2021： 3670-3678.
[33]	XIA Y， LIU C， DA B， et al. A novel heterogeneous ensemble credit scoring model based on the stacking approach［J］. Expert Systems with Applications， 2018， 93： 182-199.
[34]	TONG E N C， MUES C， THOMAS L. A zero-adjusted gamma model for mortgage loan loss given default［J］. International Journal of Forecasting， 2013， 29（4）： 548-562.
[35]	MA X， SHA J， WANG D， et al. Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGBoost algorithms according to different high-dimensional data cleaning［J］. Electronic Commerce Research and Applications， 2018， 31： 24-39.
[36]	兰景宏，黄江林. 基于主动学习和旋转森林算法的信用卡欺诈检测［J/OL］. 计算机应用与软件［2024-12-17］..
	LAN J H， HUANG J L. A credit card fraud detection method based on active learning and rotating forest algorithm［J/OL］. Computer Applications and Software ［2024-12-17］..
[37]	蔡青松，吴金迪，白宸宇. 基于可解释集成学习的信贷违约预测［J］.计算机系统应用， 2021， 30（12）： 194-201.
	CAI Q S， WU J D， BAI C Y. Prediction of credit default based on interpretable integration learning［J］. Computer Systems and Applications， 2021， 30（12）： 194-201.
[38]	JIJO B TB， ABDULAZEEZ A M. Classification based on decision tree algorithm for machine learning［J］. Journal of Applied Science and Technology Trends， 2021， 2（1）： 20-28.
[39]	SPERANDEI S. Understanding logistic regression analysis［J］. Biochemia Medica， 2014， 24（1）： 12-18.
[40]	PETERSON L E. K-nearest neighbor［J］. Scholarpedia， 2009， 4（2）： No.1883.
[41]	MEYER D， WIEN F. Support vector machines［J］. R News， 2001， 1（3）： 23-26.
[42]	TAUD H， MAS J F. Multilayer perceptron （MLP）［M］// CAMACHO OLMEDO M T， PAEGELOW M， MAS J F， et al. Geomatic approaches for modeling land change scenarios， LNGC. Cham： Springer， 2018： 451-455.
[43]	ARIK S Ö， PFISTER T. TabNet： attentive interpretable tabular learning［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 6679-6687.
[44]	CHENG H T， KOC L， HARMSEN J， et al. Wide & Deep learning for recommender systems［C］// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. New York： ACM， 2016： 7-10.
[45]	GUO H， TANG R， YE Y， et al. DeepFM： a factorization-machine based neural network for CTR prediction［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. California： ijcai.org， 2017： 1725-1731.
[46]	LIAN J， ZHOU X， ZHANG F， et al. xDeepFM： Combining explicit and implicit feature interactions for recommender systems［C］// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2018： 1754-1763.
[47]	CHUANG C Y， ROBINSON J， LIN Y C， et al. Debiased contrastive learning［C］// Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2020： 8765-8775.

真实情况	预测情况
真实情况	欺诈交易	正常交易
欺诈交易	正确（TP）	一类错误（FN）
正常交易	二类错误（FP）	正确（TN）

真实情况	预测情况
真实情况	欺诈交易	正常交易
欺诈交易	正确（TP）	一类错误（FN）
正常交易	二类错误（FP）	正确（TN）

数据集	TP	TN	FP	FN	I_FNR	I_FPR
Credit Card	97	55 102	2 761	3	0.03	0.031 0
IBM	190	192 579	10 348	19	0.09	0.051 0
PaySim	1 570	283 050	7 681	69	0.04	0.026 4
IEEE-CIS	3 300	73 371	40 605	834	0.20	0.356 2

数据集	TP	TN	FP	FN	I_FNR	I_FPR
Credit Card	97	55 102	2 761	3	0.03	0.031 0
IBM	190	192 579	10 348	19	0.09	0.051 0
PaySim	1 570	283 050	7 681	69	0.04	0.026 4
IEEE-CIS	3 300	73 371	40 605	834	0.20	0.356 2

数据集	TP	TN	FP	FN	I_FNR	I_FPR
Credit Card	97	55 794	1 069	3	0.03	0.018 8
IBM	190	192 915	10 012	19	0.09	0.043 1
PaySim	1 570	287 694	3 037	69	0.04	0.010 4
IEEE-CIS	3 300	78 525	35 451	834	0.20	0.311 0

ContraStacker： an ensemble approach for extremely imbalanced fraud detection

ContraStacker：一种极度不平衡欺诈检测的集成方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 7

References 47

Related Articles 15

Recommended Articles

Metrics

数据集	FNR/%	FPR/%
数据集	FNR/%	预警器	ContraStacker	降幅
Credit Card	3	3.10	1.88	39.4
IBM	9	5.10	4.31	15.5
PaySim	4	2.64	1.04	60.6
IEEE-CIS	20	35.62	31.10	12.7

[1]	Chuandong QIN, Zhiqiang SUO. Skin cancer classification integrating improved ResNet50 with ensemble classifier [J]. Journal of Computer Applications, 2026, 46(4): 1354-1362.
[2]	Jinjiao LIN, Canshun ZHANG, Shuya CHEN, Tianxin WANG, Jian LIAN, Yonghui XU. Vehicle insurance fraud detection method based on improved graph attention network [J]. Journal of Computer Applications, 2026, 46(2): 437-444.
[3]	Biao ZHAO, Yuhua QIN, Rongkun TIAN, Yuehang HU, Fangrui CHEN. Dependency type and distance enhanced aspect based sentiment analysis model [J]. Journal of Computer Applications, 2025, 45(8): 2507-2514.
[4]	Zheng GU, Xuebin CHEN, Hongyang ZHANG, Yuxin LI. Fine-tuned and filtered oversampling method based on agglomerative hierarchical clustering [J]. Journal of Computer Applications, 2025, 45(7): 2138-2144.
[5]	Yulin HE, Xu LI, Yingting HE, Laizhong CUI, Zhexue HUANG. Subspace Gaussian mixture model clustering ensemble algorithm based on maximum mean discrepancy [J]. Journal of Computer Applications, 2025, 45(6): 1712-1723.
[6]	Daoquan LI, Zheng XU, Sihui CHEN, Jiayu LIU. Network traffic classification model integrating variational autoencoder and AdaBoost-CNN [J]. Journal of Computer Applications, 2025, 45(6): 1841-1848.
[7]	Ruilong CHEN, Tao HU, Youjun BU, Peng YI, Xianjun HU, Wei QIAO. Stacking ensemble adversarial defense method for encrypted malicious traffic detection model [J]. Journal of Computer Applications, 2025, 45(3): 864-871.
[8]	Zirong HONG, Guangqing BAO. Review of radar automatic target recognition based on ensemble learning [J]. Journal of Computer Applications, 2025, 45(2): 371-382.
[9]	Qiangkui LENG, Xuezi SUN, Xiangfu MENG. Oversampling method for imbalanced data based on sample potential and noise evolution [J]. Journal of Computer Applications, 2024, 44(8): 2466-2475.
[10]	Fan YANG, Yao ZOU, Mingzhi ZHU, Zhenwei MA, Dawei CHENG, Changjun JIANG. Credit card fraud detection model based on graph attention Transformation neural network [J]. Journal of Computer Applications, 2024, 44(8): 2634-2642.
[11]	Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN. Oversampling algorithm based on synthesizing minority class samples using relationship between features [J]. Journal of Computer Applications, 2024, 44(5): 1428-1436.
[12]	Xiang GUO, Wengang JIANG, Yuhang WANG. Encrypted traffic classification method based on improved Inception-ResNet [J]. Journal of Computer Applications, 2023, 43(8): 2471-2476.
[13]	Dongliang MU, Meng HAN, Ang LI, Shujuan LIU, Zhihui GAO. Overview of classification methods for complex data streams with concept drift [J]. Journal of Computer Applications, 2023, 43(6): 1664-1675.
[14]	Lin SUN, Jinxu HUANG, Jiucheng XU. Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization algorithm [J]. Journal of Computer Applications, 2023, 43(6): 1842-1854.
[15]	Yi JIANG, Shuping WU, Kun HU, Linbo LONG. Imbalanced data classification method based on Lasso and constructive covering algorithm [J]. Journal of Computer Applications, 2023, 43(4): 1086-1093.