Customer churn prediction model integrating hierarchical graph neural network and specific feature learning

doi:10.11772/j.issn.1001-9081.2025020202

Abstract

Abstract:

To address the severity of customer churn in the inclusive finance field and the shortcomings of the existing customer retention models in prediction accuracy and interpretability， a customer churn prediction model integrating Hierarchical Graph Neural Network （HGNN） and Specific Feature Learning （SFL）， HGNN-SFLN （HGNN-SFL Network）， was proposed to enhance the model’s prediction capability and understanding of feature interactions. Firstly， to address the data imbalance issue， an innovative hybrid sampling strategy was introduced， and feature-level weighted adjustments for different feature categories were implemented to ensure the effective utilization of all data types. Secondly， a hierarchical graph was utilized to strengthen correlations between different features， and an SFL module based on a self-attention mechanism was constructed to improve the model’s ability to process categorical features and analyze feature interaction relationships. Through this module， accurate identification of key features and effective capturing of complex interaction relationships between them were enabled by the model， thereby optimizing the prediction decision-making process. Experimental results demonstrate that the proposed model achieves optimal results on multiple real-world financial datasets compared to mainstream models such as Light GBM （Light Gradient Boosting Machine） and Deep Neural Network （DNN）in key indicators such as Area Under Curve （AUC）. Furthermore， the proposed model has significant advantages over the comparison models in the accurate identification of critical churn-related features and the effective capturing of complex feature interaction relationships.

Key words: customer churn prediction, data imbalance, feature interaction modeling, specific feature, Hierarchical Graph Neural Network (HGNN)

摘要：

针对普惠金融领域客户流失问题的严峻性及现有客户挽留模型在预测精度与可解释性上的不足，提出一种基于层次图神经网络（HGNN）和差异化特征学习（SFL）的客户流失预测模型HGNN-SFLN （HGNN-SFL Network），以提升模型的预测能力和对特征交互的理解。首先，为了应对数据不平衡问题，提出一种混合采样策略，并在特征层面对不同类别的特征进行加权调整，以确保各类数据的有效利用；其次，利用层次图强化不同特征之间的关联性，并构建一种基于自注意力机制的SFL模块，以增强模型对分类特征的处理能力及特征交互关系的解析能力。通过该模块，模型能够精准识别关键特征，并有效捕捉它们之间的复杂交互关系，从而优化预测决策过程。实验结果表明，所提模型在多个真实金融数据集上相较于主流模型，如Light GBM（Light Gradient Boosting Machine）和深度神经网络（DNN），在曲线下面积（AUC）等关键指标上都取得了最优结果，并且在精确识别关键流失特征以及有效捕捉特征间的复杂交互关系方面，相较于对比模型展现出显著的优势。

关键词: 客户流失预测, 数据不平衡, 特征交互建模, 差异化特征, 层次图神经网络

CLC Number:

TP391

Yanqun LU, Yiyi ZHAO. Customer churn prediction model integrating hierarchical graph neural network and specific feature learning[J]. Journal of Computer Applications, 2025, 45(9): 3057-3066.

卢燕群, 赵奕奕. 基于层次图神经网络和差异化特征学习的客户流失预测模型[J]. 《计算机应用》唯一官方网站, 2025, 45(9): 3057-3066.

Figures/Tables 11

References 32

[1]	沈艳，江弘毅，胡诗云，等. 数字金融支持高质量发展：理论、机制和证据［J］. 金融研究， 2024（7）： 20-39.
	SHEN Y， JIANG H Y， HU S Y， et al. Digital finance supports high-quality development： theory， mechanism and evidence ［J］. Journal of Financial Research， 2024（7）： 20-39.
[2]	LEMMENS A， CROUX C. Bagging and boosting classification trees to predict churn ［J］. Journal of Marketing Research， 2006， 43（2）： 276-286.
[3]	XIE Y， LI X， NGAI E W T， et al. Customer churn prediction using improved balanced random forests ［J］. Expert Systems with Applications， 2009， 36（3 Pt 1）： 5445-5449.
[4]	RAEDER T， HOENS T R， CHAWLA N V. Consequences of variability in classifier performance estimates ［C］// Proceedings of the 2010 IEEE International Conference on Data Mining. Piscataway： IEEE， 2010： 421-430.
[5]	DOMINGOS E， OJEME B， DARAMOLA O. Experimental analysis of hyperparameters for deep learning-based churn prediction in the banking sector ［J］. Computation， 2021， 9（3）： No.34.
[6]	张嵌嵌，何利力. 基于ResNet和DF融合的用户购买预测算法研究［J］. 软件工程与应用， 2022， 11（1）： 50-59.
	ZHANG Q Q， HE L L. Research on user purchase prediction algorithm based on the fusion of ResNet and DF ［J］. Software Engineering and Applications， 2022， 11（1）： 50-59.
[7]	LIU Y， MU S， GU J， et al. Intelligent prediction of customer churn with a fused attentional deep learning model ［J］. Mathematics， 2022， 10（24）： No.4733.
[8]	刘天畅，王雷，朱庆华. 基于SHAP解释方法的智慧居家养老服务平台用户流失预测研究［J］. 数据分析与知识发现， 2024， 8（1）： 40-54.
	LIU T C， WANG L， ZHU Q H. Predicting user churn of smart home-based care services based on SHAP interpretation ［J］. Data Analysis and Knowledge Discovery， 2024， 8（1）： 40-54.
[9]	梁龙跃，王浩竹. 基于图卷积神经网络的个人信用风险预测［J］. 计算机工程与应用， 2023， 59（17）： 275-285.
	LIANG L Y， WANG H Z. Personal credit risk prediction based on graph convolutional neural network ［J］. Computer Engineering and Applications， 2023， 59（17）： 275-285.
[10]	XIONG X， ZHANG D， XU D， et al. A new method of financial multivariate time series forecasting based on complex network attention mechanism［EB/OL］. ［2024-12-02］. .
[11]	魏少朋，梁婷，赵宇，等. 面向企业信用风险评估的多视角异质图神经网络方法［J］. 计算机研究与发展， 2024， 61（8）： 1957-1967.
	WEI S P， LIANG T， ZHAO Y， et al. Multi-view heterogeneous graph neural network method for enterprise credit risk assessment ［J］. Journal of Computer Research and Development， 2024， 61（8）： 1957-1967.
[12]	段刚龙，王妍，马鑫，等. 银行客户分类的数据特征选择方法与实证研究［J］. 计算机工程与应用， 2022， 58（11）： 302-312.
	DUAN G L， WANG Y， MA X， et al. Data feature selection method and empirical research of bank customer segmentation ［J］. Computer Engineering and Applications， 2022， 58（11）： 302-312.
[13]	刘政昊，张志剑，陈帅朴，等. 面向金融领域的风险事件演化关系建模与表示方法研究［J］. 数据分析与知识发现， 2023， 7（8）： 78-94.
	LIU Z H， ZHANG Z J， CHEN S P， et al. Modelling and representation of risk event evolution in financial field ［J］. Data Analysis and Knowledge Discovery， 2023， 7（8）： 78-94.
[14]	马文星，王锋，韦晓. 金融场景下大数据建模常见的数据质量问题及应对策略研究［J］. 质量与认证， 2024（7）： 54-57.
	MA W X， WANG F， WEI X. Research on common data quality issues and countermeasures in big data modeling in financial scenarios［J］. China Quality Certification， 2024（7）： 54-57.
[15]	顾天下，刘勤明. 面向高维和不平衡数据的供应链金融信用评价［J］. 计算机应用研究， 2022， 39（11）： 3396-3401.
	GU T X， LIU Q M. Credit evaluation of supply chain finance for high-dimensional and unbalanced data ［J］. Application Research of Computers， 2022， 39（11）： 3396-3401.
[16]	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2017： 6000-6010.
[17]	SCARSELLI F， GORI M， TSOI A C， et al. The graph neural network model ［J］. IEEE Transactions on Neural Networks， 2009， 20（1）： 61-80.
[18]	SCHLICHTKRULL M， KIPF T N， BLOEM P， et al. Modeling relational data with graph convolutional networks ［C］// Proceedings of the 2018， European Semantic Web Conference， LNCS 10843. Cham： Springer， 2018： 593-607.
[19]	HU Z， DONG Y， WANG K， et al. Heterogeneous graph Transformer ［C］// Proceedings of the Web Conference 2020. New York： ACM， 2020： 2704-2710.
[20]	BI W， DU L， FU Q， et al. MM-GNN： mix-moment graph neural network towards modeling neighborhood feature distribution ［C］// Proceedings of the 16th ACM International Conference on Web Search and Data Mining. New York： ACM， 2023： 132-140.
[21]	JIAO L， CHEN J， LIU F， et al. Graph representation learning meets computer vision： a survey ［J］. IEEE Transactions on Artificial Intelligence， 2023， 4（1）： 2-22.
[22]	GAO Y， ZHANG P， ZHOU C， et al. HGNAS++： efficient architecture search for heterogeneous graph neural networks ［J］. IEEE Transactions on Knowledge and Data Engineering， 2023， 35（9）： 9448-9461.
[23]	朱诗能，韩萌，杨书蓉，等. 不平衡数据流的集成分类方法综述［J］. 计算机工程与应用， 2025， 61（2）： 59-72.
	ZHU S N， HAN M， YANG S R， et al. Ensemble classification methods for unbalanced data streams ［J］. Computer Engineering and Applications， 2025， 61（2）： 59-72.
[24]	周捷，严建峰，杨璐，等. LSTM模型集成方法在客户流失预测中的应用［J］. 计算机应用与软件， 2019， 36（11）： 39-46.
	ZHOU J， YAN J F， YANG L， et al. Application of LSTM ensemble method in customer churn prediction ［J］. Computer Applications and Software， 2019， 36（11）： 39-46.
[25]	费振华. 基于机器学习的不平衡数据下个人信用评分预测模型研究［J］. 长江信息通信， 2024， 37（4）： 112-114.
	FEI Z H. Research on personal credit rating prediction model based on machine learning for unbalanced data ［J］. Changjiang Information and Communications， 2024， 37（4）： 112-114.
[26]	史明华，吴广潮. 基于聚类混合采样的不平衡数据分类［J］. 计算机与现代化， 2020（5）： 34-38.
	SHI M H， WU G C. An Imbalanced data classification of hybrid sampling based on clustering ［J］. Computers and Modernization， 2020（5）： 34-38.
[27]	郑建华，李小敏，刘双印，等. 融合级联上采样与下采样的改进随机森林不平衡数据分类算法［J］. 计算机科学， 2021， 48（7）： 145-154.
	ZHENG J H， LI X M， LIU S Y， et al. Improved random forest imbalance data classification algorithm combining cascaded up-sampling and down-sampling［J］. Computer Science， 2021， 48（7）： 145-154.
[28]	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection ［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway： IEEE， 2017： 2999-3007.
[29]	ABADAL S， JAIN A， GUIRADO R， et al. Computing graph neural networks： a survey from algorithms to accelerators ［J］. ACM Computing Surveys， 2021， 54（9）： 1-38.
[30]	田芳. 基于机器学习模型的信用卡客户流失预测［J］. 电子商务评论， 2025， 14（2）： 699-708.
	TIAN F. Forecast of credit card customer attrition based on the machine learning model ［J］. E-Commerce Letters， 2025， 14（2）： 699-708.
[31]	殷林飞，蒙雨洁. 基于DenseNet卷积神经网络的短期风电预测方法［J］. 综合智慧能源， 2024， 46（7）： 12-20.
	YIN L F， MENG Y J. Short-term wind power forecasting based on DenseNet convolutional neural networks ［J］. Integrated Intelligent Energy， 2024， 46（7）： 12-20.
[32]	ARIK S Ö， PFISTER T. TabNet： attentive interpretable tabular learning ［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2021： 6679-6687.

数据集	模型	AUC	F1-Score	AP	KS
A	Light GBM	87.77	64.97	71.12	59.05
	DNN	87.69	65.02	70.29	59.09
	CNN	87.69	64.85	70.59	59.01
	ResNet	87.71	64.65	69.85	59.46
	GCN	80.64	57.33	60.58	47.52
	GAT	86.78	63.89	68.79	57.81
	TabNet	87.00	63.40	69.05	57.91
	SFLN	88.82	66.68	73.38	61.28
	HGNN-SFLN	89.69	67.13	73.84	59.75
B	Light GBM	90.99	58.56	59.96	67.97
	DNN	92.07	55.17	57.27	68.24
	CNN	92.33	56.46	60.10	68.76
	ResNet	90.23	58.38	59.10	67.62
	GCN	85.75	51.36	54.58	62.52
	GAT	89.08	51.89	56.79	61.81
	TabNet	92.80	57.28	59.59	70.13
	SFLN	93.51	62.94	67.18	71.90
	HGNN-SFLN	93.79	63.88	67.33	71.21

数据集	模型	AUC	F1-Score	AP	KS
A	Light GBM	87.77	64.97	71.12	59.05
	DNN	87.69	65.02	70.29	59.09
	CNN	87.69	64.85	70.59	59.01
	ResNet	87.71	64.65	69.85	59.46
	GCN	80.64	57.33	60.58	47.52
	GAT	86.78	63.89	68.79	57.81
	TabNet	87.00	63.40	69.05	57.91
	SFLN	88.82	66.68	73.38	61.28
	HGNN-SFLN	89.69	67.13	73.84	59.75
B	Light GBM	90.99	58.56	59.96	67.97
	DNN	92.07	55.17	57.27	68.24
	CNN	92.33	56.46	60.10	68.76
	ResNet	90.23	58.38	59.10	67.62
	GCN	85.75	51.36	54.58	62.52
	GAT	89.08	51.89	56.79	61.81
	TabNet	92.80	57.28	59.59	70.13
	SFLN	93.51	62.94	67.18	71.90
	HGNN-SFLN	93.79	63.88	67.33	71.21

模型	AUC	F1-Score	AP	KS
SFLN	93.51	62.94	67.18	71.90
HGNN-SFLN-2H	88.44	37.68	28.08	61.59
HGNN-SFLN-4H	91.30	48.33	51.69	63.45
HGNN-SFLN-8H	93.79	63.88	67.33	71.21
HGNN-SFLN-16H	85.36	37.02	25.44	59.80

模型	AUC	F1-Score	AP	KS
SFLN	93.51	62.94	67.18	71.90
HGNN-SFLN-2H	88.44	37.68	28.08	61.59
HGNN-SFLN-4H	91.30	48.33	51.69	63.45
HGNN-SFLN-8H	93.79	63.88	67.33	71.21
HGNN-SFLN-16H	85.36	37.02	25.44	59.80

方法	AUC	F1-Score	AP	KS
不做处理	84.57	49.88	53.87	59.11
过采样	90.11	47.09	43.51	66.41
欠采样	87.01	44.53	46.55	62.53
混合采样	89.52	50.83	54.89	68.00
本文方法	93.51	62.94	67.18	71.90