Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (9): 2784-2788.DOI: 10.11772/j.issn.1001-9081.2019030571

• Frontier & interdisciplinary applications • Previous Articles     Next Articles

Application of KNN algorithm based on value difference metric and clustering optimization in bank customer behavior prediction

LI Bo<sup>1,2</sup>, ZHANG Xiao<sup>1,2</sup>, YAN Jingyi<sup>3</sup>, LI Kewei<sup>1</sup>, LI Heng<sup>1,2</sup>, LING Yulong<sup>1,2</sup>, ZHANG Yong<sup>1,2</sup>   

  1. 1. School of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China;
    2. Ministry of Communications Key Laboratory of Big Data Storage and Management(Northwestern Polytechnical University), Xi'an Shaanxi 710129, China;
    3. School of Management, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
  • Received:2019-04-08 Revised:2019-06-02 Online:2019-09-10 Published:2019-06-10
  • Supported by:

    This work is partially supported by the National Key Research and Development Program of China (2018YFB1004401).

基于值差度量和聚类优化的K最近邻算法在银行客户行为预测中的应用

李博1,2, 张晓1,2, 颜靖艺3, 李可威1, 李恒1,2, 凌玉龙1,2, 张勇1,2   

  1. 1. 西北工业大学 计算机学院, 西安 710129;
    2. 工信部大数据存储与管理重点实验室(西北工业大学), 西安 710129;
    3. 西北工业大学 管理学院, 西安 710129
  • 通讯作者: 李博
  • 作者简介:李博(1994-),男,甘肃陇西人,硕士研究生,CCF会员,主要研究方向:云存储、数据挖掘;张晓(1978-),男,河南新乡人,副教授,博士,CCF会员,主要研究方向:存储系统;颜靖艺(1993-),女(回族),广西桂林人,硕士,主要研究方向:技术创新管理;李可威(1993-),男,湖北云梦人,硕士研究生,主要研究方向:数据挖掘;李恒(1993-),男,河南周口人,硕士研究生,主要研究方向:数据挖掘;凌玉龙(1995-),男,安徽宿州人,硕士研究生,主要研究方向:数据挖掘;张勇(1995-),男,安徽六安人,硕士研究生,主要研究方向:数据挖掘。
  • 基金资助:

    国家重点研发计划项目(2018YFB1004401)。

Abstract:

In order to improve the accuracy of loan financial customer behavior prediction, aiming at the incomplete problem of dealing with non-numerical factors in data analysis of traditional K-Nearest Neighbors (KNN) algorithm, an improved KNN algorithm based on Value Difference Metric (VDM) distance and iterative optimization of clustering results was proposed. Firstly the collected data were clustered by KNN algorithm based on VDM distance, then the clustering results were analyzed iteratively, finally the prediction accuracy was improved through joint training. Based on the customer data collected by Portuguese retail banks from 2008 to 2013, it can be seen that compared with traditional KNN algorithm, FCD-KNN (Feature Correlation Difference KNN) algorithm, Gauss Naive Bayes algorithm, Gradient Boosting algorithm, the improved KNN algorithm has better performance and stability, and has great application value in the customer behavior prediction from bank data.

Key words: K-Nearest Neighbors (KNN) algorithm, Value Difference Metric (VDM) distance, financial crisis, behavior prediction, data mining

摘要:

为提升贷款金融客户行为预测的准确性,针对传统的K-最近邻(KNN)算法在数据分析中处理非数值因素的不完备问题,提出了一种采用值差度量(VDM)距离的对聚类结果迭代优化的改进KNN算法。首先对收集到的数据信息进行基于VDM距离的KNN算法的聚类,再对聚类结果进行迭代分析,最后通过联合训练提高了预测精度。基于葡萄牙零售银行2008—2013年收集的客户数据比较可知,改进的KNN算法与传统的KNN算法、基于属性值相关距离的KNN改进(FCD-KNN)算法、高斯贝叶斯算法、Gradient Boosting等现有算法相比具有更好的性能和稳定性,在银行数据预测客户行为中具有很大的应用价值。

关键词: K-最近邻算法, 值差异度量距离, 金融危机, 行为预测, 数据挖掘

CLC Number: