计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2539-2544.DOI: 10.11772/j.issn.1001-9081.2020111796

所属专题: 人工智能

• 人工智能 • 上一篇    下一篇

基于增强多维多粒度级联森林的信用评分模型

卞凌志, 王直杰   

  1. 东华大学 信息科学与技术学院, 上海 201620
  • 收稿日期:2020-11-17 修回日期:2021-01-13 出版日期:2021-09-10 发布日期:2021-05-08
  • 通讯作者: 王直杰
  • 作者简介:卞凌志(1997-),男,上海人,硕士研究生,主要研究方向:机器学习、金融大数据;王直杰(1969-),男,浙江台州人,教授,博士,主要研究方向:神经动力学、深度学习。
  • 基金资助:
    上海市哲学社会科学规划课题(2019BGL004)。

Credit scoring model based on enhanced multi-dimensional and multi-grained cascade forest

BIAN Lingzhi, WANG Zhijie   

  1. School of Information Science and Technology, Donghua University, Shanghai 201620, China
  • Received:2020-11-17 Revised:2021-01-13 Online:2021-09-10 Published:2021-05-08
  • Supported by:
    This work is partially supported by the Shanghai Philosophy and Social Science Planning Project (2019BGL004).

摘要: 信用风险是商业银行所面临的主要金融风险之一,而传统的基于统计学习的信用评分方法不能有效利用现有的特征学习方法,因此预测准确度不高。为解决这个问题,提出一种增强多维多粒度级联森林的方法建立信用评分模型,借鉴残差学习的思想,建立了多维多粒度级联残差森林(grcForest)模型,从而大幅增加提取的特征。除此之外,使用多维多粒度的扫描尽可能多地提取原始数据的特征,从而提高了特征提取的效率。对各模型的实验结果通过AUC(Area Under Curve)、准确率等指标进行评价,同时把所提模型与现有的统计学习和机器学习算法在四个不同的信用评分数据集上进行对比,可知所提出的模型的AUC值相较于轻量级梯度提升机(LightGBM)方法平均高1.13%,相较于极端梯度提升(XGBoost)方法平均高1.44%。从实验结果可以看出,提出的模型预测效果最佳。

关键词: 信用评分, 特征学习, 残差学习, 多维多粒度, 级联森林

Abstract: Credit risk is one of the main financial risks which commercial banks are faced with, while traditional credit scoring methods cannot effectively make use of the existing feature learning methods, resulting in low prediction accuracy. To solve this problem, an enhanced multi-dimensional and multi-grained cascade forest method was proposed to build credit scoring model, with the use of the idea of residual learning, the multi-dimensional and multi-grained cascade residual Forest (grcForest) model was built, which greatly increased the extracted features. Besides, the multi-dimensional multi-grained scanning was used to extract features of the raw data as many as possible, which improved the efficiency of feature extraction. The proposed model was compared with the existing statistical and machine learning methods on four credit scoring datasets, and evaluated by Area Under Curve (AUC) and accuracy. The AUC of the proposed model was 1.13% and 1.44% higher then that of the Light Gradient Boosting Machine (LightGBM) and the eXtreme Gradient Boosting (XGBoost). Experimental results show that the proposed model performs best in the prediction.

Key words: credit scoring, feature learning, residual learning, multi-dimensional and multi-grained, cascade forest

中图分类号: