计算机应用 ›› 2018, Vol. 38 ›› Issue (10): 3042-3047.DOI: 10.11772/j.issn.1001-9081.2018030673

• 应用前沿、交叉与综合 • 上一篇    下一篇

基于TF-IDF算法的P2P贷款违约预测模型

章宁1, 陈钦1,2   

  1. 1. 中央财经大学 信息学院, 北京 100081;
    2. 国银金融租赁股份有限公司 信息化管理部, 广东 深圳 518038
  • 收稿日期:2018-04-02 修回日期:2018-05-15 出版日期:2018-10-10 发布日期:2018-10-13
  • 通讯作者: 陈钦
  • 作者简介:章宁(1975-),女,江西临川人,教授,博士,主要研究方向:互联网金融、个人信息保护、服务外包;陈钦(1977-),男,江西南昌人,高级工程师,博士研究生,主要研究方向:金融科技、智能投资、大数据分析、信息检索。
  • 基金资助:
    国家重点研发计划项目(2017YFB1400701)。

P2P loan default prediction model based on TF-IDF algorithm

ZHANG Ning1, CHEN Qin1,2   

  1. 1. School of Information, Central University of Finance and Economics, Beijing 100081, China;
    2. IT Department, China Development Bank Financial Leasing Company Limited, Shenzhen Guangdong 518038, China
  • Received:2018-04-02 Revised:2018-05-15 Online:2018-10-10 Published:2018-10-13
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2017YFB1400701).

摘要: 针对目前P2P贷款违约预测模型受限于借贷双方信息不对称性,未考虑投资人之间差异性的问题,提出了基于信息检索词频-逆文本频率(TF-IDF)算法的P2P贷款违约预测模型。首先以投资效用理论为基础,利用投资人历史投资收益率、贷款利率出价等信息,建立基于投资人效用的贷款违约预测模型;然后,借鉴信息检索TF-IDF算法,构造投资人逆向投资比例因子,对投资人差异性进行量化度量,优化模型中投资人权重计算因子。实验结果表明,该模型预测准确度与其他模型相比平均提高了6%左右,并在不同的测试数据集上都保持最优。

关键词: 贷款违约预测, 效用理论, 信息检索, 词频-逆文本频率, 个人对个人借贷, 曲线下面积

Abstract: Concerning that current P2P loan default prediction models are limited by the information asymmetry of lenders and borrowers, and do not take differences between loan lenders into account, a P2P loan default prediction model based on Term Frequency-Inverse Document Frequency (TF-IDF) algorithm of information retrieval was proposed. Firstly, based on the investment utility theory, a loan default prediction model was established by using the information such as lender's historical investment profit rate and loan bid interest rate. Secondly, referred to TF-IDF algorithm of information retrieval, loan lender's reverse investment scale factor was constructed to quantify the lender's differences, and the weight factor in the model were optimized. Experimental results show that the prediction effect of this model is better than those of other models on different data sets, its prediction accuracy increases by an average of 6% compared with other models.

Key words: loan default prediction, utility theory, information retrieval, Term Frequency-Inverse Document Frequency (TF-IDF), Peer-to-Peer lending (P2P lending), Area Under Curve (AUC)

中图分类号: