Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (10): 2888-2892.DOI: 10.11772/j.issn.1001-9081.2019020827

• Artificial intelligence • Previous Articles     Next Articles

Classification of online loan based on improved cost-sensitive decision tree

GUO Bingnan, WU Guangchao   

  1. College of Mathematics, South China University of Technology, Guangzhou Guangdong 510640, China
  • Received:2019-03-22 Revised:2019-05-09 Online:2019-06-03 Published:2019-10-10

基于改进的代价敏感决策树的网络贷款分类

郭冰楠, 吴广潮   

  1. 华南理工大学 数学学院, 广州 510640
  • 通讯作者: 郭冰楠
  • 作者简介:郭冰楠(1992-),女,河南三门峡人,硕士研究生,主要研究方向:数据挖掘、机器学习;吴广潮(1972-),男,广东汕头人,副教授,博士,主要研究方向:数据挖掘、机器学习。

Abstract: In the online loan user data set, there is a serious imbalance between the number of successful and failed loan users. The traditional machine learning algorithm pays attention to the overall classification accuracy when solving such problems, which leads to lower prediction accuracy of successful loan users. In order to solve this problem, the class distribution was added to the calculation of cost-sensitive decision tree sensitivity function, in order to weaken the impact of positive and negative samples on the misclassification cost, and an improved cost-sensitive decision tree based on ID3 (ID3cs)was constructed. With the improved cost-sensitive decision tree as the base classifier and the classification accuracy as the criterion, the base classifiers with better performance were selected and integrated with the classifier generated in the last stage to obtain the final classifier. Experimental results show that compared with the existing algorithms to solve such problems (such as MetaCost algorithm, cost-sensitive decision tree, AdaCost algorithm), the improved cost-sensitive decision tree can reduce the overall misclassification rate of online loan users and has stronger generalization ability.

Key words: imbalance, cost-sensitive, online loan, integrated learning, decision tree

摘要: 在网络贷款用户数据集中,贷款成功和贷款失败的用户数量存在着严重的不平衡,传统的机器学习算法在解决该类问题时注重整体分类正确率,导致贷款成功用户的预测精度较低。针对此问题,在代价敏感决策树敏感函数的计算中加入类分布,以减弱正负样本数量对误分类代价的影响,构建改进的代价敏感决策树;以该决策树作为基分类器并以分类准确度作为衡量标准选择表现较好的基分类器,将它们与最后阶段生成的分类器集成得到最终的分类器。实验结果表明,与已有的常用于解决此类问题的算法(如MetaCost算法、代价敏感决策树、AdaCost算法等)相比,改进的代价敏感决策树对网络贷款用户分类可以降低总体的误分类错误率,具有更强的泛化能力。

关键词: 不平衡, 代价敏感, 网络贷款, 集成学习, 决策树

CLC Number: