Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1363-1369.DOI: 10.11772/j.issn.1001-9081.2025050692

• Artificial intelligence •    

ContraStacker: an ensemble approach for extremely imbalanced fraud detection

Xingcan LI, Lizhong DING(), Junyu ZHANG, Chunhui ZHANG   

  1. School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China
  • Received:2025-06-23 Revised:2025-07-20 Accepted:2025-07-23 Online:2025-08-01 Published:2026-05-10
  • Contact: Lizhong DING
  • About author:LI Xingcan, born in 2002, M. S. candidate. His research interests include imbalanced learning, AI native application.
    ZHANG Junyu, born in 1996, Ph. D. candidate. His research interests include multi-table learning, ensemble learning.
    ZHANG Chunhui, born in 1998, Ph. D. candidate. His research interests include contrastive learning, financial data mining.
  • Supported by:
    National Key Research and Development Program of China(2022YFB2703100);Joint Funds of National Natural Science Foundation of China(U22A2099);General Project of National Natural Science Foundation of China(62376028);Excellent Young Scientists Fund (Overseas) of National Natural Science Foundation of China

ContraStacker:一种极度不平衡欺诈检测的集成方法

李星灿, 丁立中(), 张君宇, 张春晖   

  1. 北京理工大学 计算机学院,北京 100081
  • 通讯作者: 丁立中
  • 作者简介:李星灿(2002—),男,云南曲靖人,硕士研究生,主要研究方向:不平衡学习、AI原生应用
    张君宇(1996—),男,湖北荆州人,博士研究生,主要研究方向:多表格学习、集成学习
    张春晖(1998—),男,吉林四平人,博士研究生,主要研究方向:对比学习、金融数据挖掘。
  • 基金资助:
    国家重点研发计划项目(2022YFB2703100);国家自然科学基金联合基金项目(U22A2099);国家自然科学基金面上项目(62376028);国家自然科学基金优秀青年科学基金资助项目(海外)

Abstract:

Machine learning, relying on data modeling and feature recognition techniques, constructs social risk prediction models, enabling intelligent decision-making in risk prevention and control systems. However, fraud detection tasks are constrained by the severe imbalance between positive and negative samples. In cases of extreme imbalance, even if the model predicts all transactions as normal, the accuracy can still exceed 99%, while the detection rate of fraudulent transactions is close to zero. Moreover, a single model can only capture fraud features with specific dimensions and struggles to comprehensively predict multiple fraud patterns. To address this, a ContraStacker ensemble method was proposed to overcome data imbalance limitations, compensate for the shortcomings of a single model, and accurately identify various fraud patterns to improve fraud detection rate. ContraStacker balanced the data distribution through oversampling, undersampling, and their combined strategies, constructed multiple risk predictors, and integrated contrastive loss functions into the Stacking framework to deeply fuse model predictions and original features, enhancing the model's generalization ability, successfully tackling the challenge of extreme imbalance in fraud detection. Experimental results show that ContraStacker effectively reduces False Positive Rate (FPR) (the proportion of normal transactions predicted as fraudulent ones) while maintaining a low False Negative Rate (FNR) (the proportion of fraudulent transactions predicted as normal ones), demonstrating its potential for application in financial transaction security.

Key words: fraud detection, ensemble learning, imbalanced data, contrastive loss, risk predictor

摘要:

机器学习依托数据建模与特征识别技术构建社会风险预测模型,赋能防控体系智能决策。然而,欺诈检测任务因正负样本数量严重不平衡导致传统方法效果受限,在极端不平衡的情况下,即便模型将所有交易预测为正常交易,准确率仍可高达99%以上,但欺诈交易的检出率却接近零;且单一模型仅能捕捉特定维度的欺诈特征,难以全面预测多种欺诈模式。因此,提出ContraStacker集成方法,突破数据不平衡限制,弥补单一模型的局限,精准识别多种欺诈模式,提升欺诈交易检出率。ContraStacker通过过采样、欠采样及其组合策略平衡数据分布,构建多风险预警器,并在Stacking框架中引入对比损失函数,深度融合模型预测结果与原始特征,增强模型泛化能力,成功应对极端不平衡的欺诈检测挑战。实验结果表明,ContraStacker在多个欺诈检测数据集上有效降低了误报率(FPR)(将正常交易预测为欺诈交易的比例),同时保持较低的漏检率(FNR)(将欺诈交易预测为正常交易的比例),在金融交易安全中具备应用潜力。

关键词: 欺诈检测, 集成学习, 不平衡数据, 对比损失, 风险预警器

CLC Number: