基于增强多维多粒度级联森林的信用评分模型

doi:10.11772/j.issn.1001-9081.2020111796

计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2539-2544.DOI: 10.11772/j.issn.1001-9081.2020111796

所属专题：人工智能

基于增强多维多粒度级联森林的信用评分模型

卞凌志, 王直杰

东华大学信息科学与技术学院, 上海 201620

收稿日期:2020-11-17 修回日期:2021-01-13 发布日期:2021-05-08 出版日期:2021-09-10
通讯作者: 王直杰
作者简介:卞凌志(1997-),男,上海人,硕士研究生,主要研究方向:机器学习、金融大数据;王直杰(1969-),男,浙江台州人,教授,博士,主要研究方向:神经动力学、深度学习。
基金资助:
上海市哲学社会科学规划课题（2019BGL004）。

Credit scoring model based on enhanced multi-dimensional and multi-grained cascade forest

BIAN Lingzhi, WANG Zhijie

School of Information Science and Technology, Donghua University, Shanghai 201620, China

Received:2020-11-17 Revised:2021-01-13 Online:2021-05-08 Published:2021-09-10
Supported by:
This work is partially supported by the Shanghai Philosophy and Social Science Planning Project (2019BGL004).

摘要/Abstract

摘要： 信用风险是商业银行所面临的主要金融风险之一，而传统的基于统计学习的信用评分方法不能有效利用现有的特征学习方法，因此预测准确度不高。为解决这个问题，提出一种增强多维多粒度级联森林的方法建立信用评分模型，借鉴残差学习的思想，建立了多维多粒度级联残差森林（grcForest）模型，从而大幅增加提取的特征。除此之外，使用多维多粒度的扫描尽可能多地提取原始数据的特征，从而提高了特征提取的效率。对各模型的实验结果通过AUC（Area Under Curve）、准确率等指标进行评价，同时把所提模型与现有的统计学习和机器学习算法在四个不同的信用评分数据集上进行对比，可知所提出的模型的AUC值相较于轻量级梯度提升机（LightGBM）方法平均高1.13%，相较于极端梯度提升（XGBoost）方法平均高1.44%。从实验结果可以看出，提出的模型预测效果最佳。

关键词: 信用评分, 特征学习, 残差学习, 多维多粒度, 级联森林

Abstract: Credit risk is one of the main financial risks which commercial banks are faced with, while traditional credit scoring methods cannot effectively make use of the existing feature learning methods, resulting in low prediction accuracy. To solve this problem, an enhanced multi-dimensional and multi-grained cascade forest method was proposed to build credit scoring model, with the use of the idea of residual learning, the multi-dimensional and multi-grained cascade residual Forest (grcForest) model was built, which greatly increased the extracted features. Besides, the multi-dimensional multi-grained scanning was used to extract features of the raw data as many as possible, which improved the efficiency of feature extraction. The proposed model was compared with the existing statistical and machine learning methods on four credit scoring datasets, and evaluated by Area Under Curve (AUC) and accuracy. The AUC of the proposed model was 1.13% and 1.44% higher then that of the Light Gradient Boosting Machine (LightGBM) and the eXtreme Gradient Boosting (XGBoost). Experimental results show that the proposed model performs best in the prediction.

Key words: credit scoring, feature learning, residual learning, multi-dimensional and multi-grained, cascade forest

中图分类号:

TP181

卞凌志, 王直杰. 基于增强多维多粒度级联森林的信用评分模型[J]. 计算机应用, 2021, 41(9): 2539-2544.

BIAN Lingzhi, WANG Zhijie. Credit scoring model based on enhanced multi-dimensional and multi-grained cascade forest[J]. Journal of Computer Applications, 2021, 41(9): 2539-2544.

参考文献

[1] 王春峰, 万海晖, 张维. 基于神经网络技术的商业银行信用风险评估[J]. 系统工程理论与实践,1999,19(9):24-32.(WANG C F,WAN H H,ZHANG W. Credit risk assessment in commercial banks using neural networks[J]. Systems Engineering-Theory and Practice,1999,19(9):24-32.)
[2] ZHANG W Y,YANG D Q,ZHANG S,et al. A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring[J]. Expert Systems with Applications,2021,165:No. 113872.
[3] STEENACKERS A,GOOVAERTS M J. A credit scoring model for personal loans[J]. Insurance:Mathematics and Economics,1989, 8(1):31-34.
[4] BAESENS B,SETIONO R,MUES C,et al. Using neural network rule extraction and decision tables for credit risk evaluation[J]. Management Science,2003,49(3):312-329.
[5] BREIMAN L. Random forests[J]. Machine Learning,2001,45(1):5-32.
[6] FREUND Y,SCHAPIRE R E. Experiments with a new boosting algorithm[C]//Proceedings of the 13th International Conference on Machine Learning. San Francisco:Morgan Kaufmann Publishers Inc.,1996:148-156.
[7] FRIEDMAN J H. Greedy function approximation:a gradient boosting machine[J]. The Annals of Statistics,2001,29(5):1189-1232.
[8] CHEN T Q,GUESTRIN C. XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM,2016:785-794.
[9] KE G L,MENG Q,FINLEY T,et al. LightGBM:a highly efficient gradient boosting decision tree[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook,NY:Curran Associates Inc.,2017:3146-3154.
[10] ARORA N,KAUR P D. A Bolasso based consistent feature selection enabled random forest classification algorithm:an application to credit risk assessment[J]. Applied Soft Computing, 2020,86:No. 105936.
[11] MOSCATO V,PICARIELLO A,SPERLÍ G. A benchmark of machine learning approaches for credit score prediction[J]. Expert Systems with Applications,2021,165:No. 113986.
[12] ZHOU Z H,FENG J. Deep forest:towards an alternative to deep neural networks[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Palo Alto, CA:AAAI Press,2017:3553-3559.
[13] 陈卫中, 倪宗瓒, 潘晓平, 等. 用ROC曲线确定最佳临界点和可疑值范围[J]. 现代预防医学,2005,32(7):729-731.(CHEN W Z,NI Z Z,PAN X P,et al. Receiver Operating Characteristic curves to determine the optimal operating point and doubtable value interval[J]. Modern Preventive Medicine,2005,32(7):729-731.)
[14] HAND D J. Measuring classifier performance:a coherent alternative to the area under the ROC curve[J]. Machine Learning,2009,77(1):103-123.
[15] 赵琳娜, 刘琳, 刘莹, 等. 观测降水概率不确定性对集合预报概率Brier技巧评分结果的分析[J]. 气象,2015,41(6):685-694.(ZHAO L N,LIU L,LIU Y,et al. Impact of observation uncertainty of precipitation on the Brier Skill Score of global ensemble prediction system[J]. Meteorological Monthly,2015,41(6):685-694.)
[16] 温忠麟, 侯杰泰, 马什赫伯特. 结构方程模型检验:拟合指数与卡方准则[J]. 心理学报,2004,36(2):186-194.(WEN Z L, HOU J T,MASH H W. Structural equation model testing:cutoff criteria for goodness of fit indices and Chi-square test[J]. Acta Psychologica Sinica,2004,36(2):186-194.)
[17] GUYON I,WESTON J,BARNHILL S,et al. Gene selection for cancer classification using support vector machines[J]. Machine Learning,2002,46(1/2/3):389-422.
[18] FISHER R A. The use of multiple measurements in taxonomic problems[J]. Annals of Eugenics,1936,7(2):179-188.
[19] DUA D,GRAFF C. UCI machine learning repository[DS/OL].[2020-11-17]. http://archive.ics.uci.edu/ml.
[20] XIA Y F,LIU C Z,LI Y Y,et al. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring[J]. Expert Systems with Applications, 2017, 78:225-241.

基于增强多维多粒度级联森林的信用评分模型

Credit scoring model based on enhanced multi-dimensional and multi-grained cascade forest

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[2]	郑智雄, 刘建华, 孙水华, 徐戈, 林鸿辉. 融合多窗口局部信息的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1796-1802.
[3]	唐海涛, 王红军, 李天瑞. 判别多维标度特征学习[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1323-1329.
[4]	靳华中, 张修洋, 叶志伟, 张闻其, 夏小鱼. 基于近似U型网络结构的图像去噪模型[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2571-2577.
[5]	余晓鹏, 何儒汉, 黄晋, 张俊杰, 胡新荣. 基于改进Inception结构的知识图谱嵌入模型[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1065-1071.
[6]	曹一珉, 蔡磊, 高敬阳. 基于生成对抗网络的基因数据生成方法[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 783-790.
[7]	李恒鑫, 常侃, 谭宇飞, 凌铭阳, 覃团发. 应用通道间相关性及增强信息蒸馏的彩色图像去马赛克网络[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 245-251.
[8]	刘欢, 李晓戈, 胡立坤, 胡飞雄, 王鹏华. 基于知识图谱驱动的图神经网络推荐模型[J]. 计算机应用, 2021, 41(7): 1865-1870.
[9]	牛康力, 谌雨章, 沈君凤, 曾张帆, 潘永才, 王绎冲. 基于深度学习的双通道夜视图像复原方法[J]. 计算机应用, 2021, 41(6): 1775-1784.
[10]	黄梨, 卢龙. 基于长距离依赖编码与深度残差U-Net的缺血性卒中病灶分割[J]. 计算机应用, 2021, 41(6): 1820-1827.
[11]	梁敏, 王昊榕, 张瑶, 李杰. 基于加速残差网络的图像超分辨率重建方法[J]. 计算机应用, 2021, 41(5): 1438-1444.
[12]	魏文钰, 杨文忠, 马国祥, 黄梅. 基于深度学习的行人再识别技术研究综述[J]. 《计算机应用》唯一官方网站, 2020, 40(9): 2479-2492.
[13]	陈赛健, 朱远平. 基于生成对抗网络的文本图像联合超分辨率与去模糊方法[J]. 计算机应用, 2020, 40(3): 859-864.
[14]	刘紫燕, 万培佩. 基于注意力机制的行人重识别特征提取方法[J]. 计算机应用, 2020, 40(3): 672-676.
[15]	翁理国, 刘万安, 施必成, 夏旻. 基于多维多粒度级联森林的高原地区云雪分类[J]. 计算机应用, 2018, 38(8): 2218-2223.