基于值差度量和聚类优化的K最近邻算法在银行客户行为预测中的应用

doi:10.11772/j.issn.1001-9081.2019030571

计算机应用 ›› 2019, Vol. 39 ›› Issue (9): 2784-2788.DOI: 10.11772/j.issn.1001-9081.2019030571

• 应用前沿、交叉与综合 • 上一篇下一篇

基于值差度量和聚类优化的K最近邻算法在银行客户行为预测中的应用

李博^1,2, 张晓^1,2, 颜靖艺³, 李可威¹, 李恒^1,2, 凌玉龙^1,2, 张勇^1,2

1. 西北工业大学计算机学院, 西安 710129;
2. 工信部大数据存储与管理重点实验室(西北工业大学), 西安 710129;
3. 西北工业大学管理学院, 西安 710129

收稿日期:2019-04-08 修回日期:2019-06-02 发布日期:2019-06-10 出版日期:2019-09-10
通讯作者: 李博
作者简介:李博(1994-),男,甘肃陇西人,硕士研究生,CCF会员,主要研究方向:云存储、数据挖掘;张晓(1978-),男,河南新乡人,副教授,博士,CCF会员,主要研究方向:存储系统;颜靖艺(1993-),女(回族),广西桂林人,硕士,主要研究方向:技术创新管理;李可威(1993-),男,湖北云梦人,硕士研究生,主要研究方向:数据挖掘;李恒(1993-),男,河南周口人,硕士研究生,主要研究方向:数据挖掘;凌玉龙(1995-),男,安徽宿州人,硕士研究生,主要研究方向:数据挖掘;张勇(1995-),男,安徽六安人,硕士研究生,主要研究方向:数据挖掘。
基金资助:
国家重点研发计划项目（2018YFB1004401）。

Application of KNN algorithm based on value difference metric and clustering optimization in bank customer behavior prediction

LI Bo1,2, ZHANG Xiao1,2, YAN Jingyi3, LI Kewei1, LI Heng1,2, LING Yulong1,2, ZHANG Yong1,2

1. School of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China;
2. Ministry of Communications Key Laboratory of Big Data Storage and Management(Northwestern Polytechnical University), Xi'an Shaanxi 710129, China;
3. School of Management, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China

Received:2019-04-08 Revised:2019-06-02 Online:2019-06-10 Published:2019-09-10
Supported by:
This work is partially supported by the National Key Research and Development Program of China (2018YFB1004401).

摘要/Abstract

摘要：

为提升贷款金融客户行为预测的准确性，针对传统的K-最近邻（KNN）算法在数据分析中处理非数值因素的不完备问题，提出了一种采用值差度量（VDM）距离的对聚类结果迭代优化的改进KNN算法。首先对收集到的数据信息进行基于VDM距离的KNN算法的聚类，再对聚类结果进行迭代分析，最后通过联合训练提高了预测精度。基于葡萄牙零售银行2008—2013年收集的客户数据比较可知，改进的KNN算法与传统的KNN算法、基于属性值相关距离的KNN改进（FCD-KNN）算法、高斯贝叶斯算法、Gradient Boosting等现有算法相比具有更好的性能和稳定性，在银行数据预测客户行为中具有很大的应用价值。

关键词: K-最近邻算法, 值差异度量距离, 金融危机, 行为预测, 数据挖掘

Abstract:

In order to improve the accuracy of loan financial customer behavior prediction, aiming at the incomplete problem of dealing with non-numerical factors in data analysis of traditional K-Nearest Neighbors (KNN) algorithm, an improved KNN algorithm based on Value Difference Metric (VDM) distance and iterative optimization of clustering results was proposed. Firstly the collected data were clustered by KNN algorithm based on VDM distance, then the clustering results were analyzed iteratively, finally the prediction accuracy was improved through joint training. Based on the customer data collected by Portuguese retail banks from 2008 to 2013, it can be seen that compared with traditional KNN algorithm, FCD-KNN (Feature Correlation Difference KNN) algorithm, Gauss Naive Bayes algorithm, Gradient Boosting algorithm, the improved KNN algorithm has better performance and stability, and has great application value in the customer behavior prediction from bank data.

Key words: K-Nearest Neighbors (KNN) algorithm, Value Difference Metric (VDM) distance, financial crisis, behavior prediction, data mining

中图分类号:

TP311.13

李博, 张晓, 颜靖艺, 李可威, 李恒, 凌玉龙, 张勇. 基于值差度量和聚类优化的K最近邻算法在银行客户行为预测中的应用[J]. 计算机应用, 2019, 39(9): 2784-2788.

LI Bo, ZHANG Xiao, YAN Jingyi, LI Kewei, LI Heng, LING Yulong, ZHANG Yong. Application of KNN algorithm based on value difference metric and clustering optimization in bank customer behavior prediction[J]. Journal of Computer Applications, 2019, 39(9): 2784-2788.

参考文献

[1] GUO J Y, WANG X, LI Y. kNN based on probability density for fault detection in multimodal processes[J]. Journal of Chemometrics, 2018, 32(7):e3021.
[2] FEKI-SAHNOUN W, NJAH H, HAMZA A, et al. Using general linear model, Bayesian networks and Naive Bayes classifier for prediction of Karenia selliformis occurrences and blooms[J]. Ecological Informatics, 2018,43:12-23.
[3] SAINI I, SINGH D, KHOSLA A. QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases[J]. Journal of Advanced Research, 2013, 4(4):331-344.
[4] 职为梅,张婷,范明.基于影响函数的k-近邻分类[J].电子与信息学报,2015,37(7):1626-1632.(ZHI W M, ZHANG T, FAN M. k-nearest neighbor classification based on influence function[J]. Journal of Electronics and Information Technology, 2015,37(7):1626-1632.)
[5] 宓文斌.数据挖掘在银行信贷业务中的应用[D]. 上海:上海交通大学,2012.(MI W B. Application of data mining in the bank credit[D]. Shanghai:Shanghai Jiao Tong University, 2012.)
[6] JIANG L, CAI Z, WANG D,et al. Survey of improving k-nearest-neighbor for classification[C]//Proceedings of the 4th International Conference on Fuzzy Systems and Knowledge Discovery. Piscataway, NJ:IEEE, 2007:679-683.
[7] 肖辉辉,段艳明.基于属性值相关距离的KNN算法的改进研究[J].计算机科学,2013,40(S2):157-159.(XIAO H H, DUAN Y M. Improved the KNN algorithm based on related to the distance of attribute value[J]. Computer Science, 2013, 40(S2):157-159.)
[8] 周治平,苗敏敏.改进的马氏距离动态时间规整手势认证方法[J]. 计算机应用,2015, 35(5):1467-1470.(ZHOU Z P, MIAO M M. Dynamic time warping gesture authentication algorithm based on improved Mahalanobis distance[J]. Journal of Computer Applications, 2015, 35(5):1467-1470.)
[9] de MAESSCHALCK R, JOUAN-RIMBAUD D, MASSART D L. The Mahalanobis distance[J]. Chemometrics and Intelligent Laboratory Systems, 2000, 50(1):1-18.
[10] TAHERI S, MAMMADOV M. Learning the naive Bayes classifier with optimization models[J]. International Journal of Applied Mathematics and Computer Science, 2013, 23(4):787-795.
[11] BIAU G, CADRE B, ROUVIÈRE L. Accelerated gradient boosting[J]. Machine Learning, 2019, 108(6):971-992.
[12] 杨朔,陈丽芳,石瑀,等.基于深度生成式对抗网络的蓝藻语义分割[J].计算机应用,2018,38(6):1554-1561.(YANG S, CHEN L F, SHI Y, et al. Semantic segmentation of blue-green algae based on deep generative adversarial net[J]. Journal of Computer Applications, 2018, 38(6):1554-1561.)

[1]	李欢欢, 黄添强, 丁雪梅, 罗海峰, 黄丽清. 基于多尺度时空图卷积网络的交通出行需求预测[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2065-2072.
[2]	董瑶, 付怡雪, 董永峰, 史进, 陈晨. 不完整多视图聚类综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1673-1682.
[3]	杨克帅, 武优西, 耿萌, 刘靖宇, 李艳. 一次性条件下top-k高平均效用序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 477-484.
[4]	郑浩东, 马华, 谢颖超, 唐文胜. 融合遗忘因素与记忆门的图神经网络知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2747-2752.
[5]	黄硕, 李艳辉, 曹建秋. 本地化差分隐私下的频繁序列模式挖掘算法PrivSPM[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2057-2064.
[6]	蒋华, 李星, 王慧娇, 韦静海. 基于数据索引结构的跨级高效用项集挖掘算法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2200-2208.
[7]	祁超帅, 何文思, 焦毅, 马英红, 蔡伟, 任素萍. 无人机飞行数据异常检测算法综述[J]. 《计算机应用》唯一官方网站, 2023, 43(6): 1833-1841.
[8]	李元江, 权金升, 谭阳奕, 杨田. 基于相似和差异双视角的高维数据属性约简[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1467-1472.
[9]	邵小萌, 张猛. 融合注意力机制的时间卷积知识追踪模型[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 343-348.
[10]	李文全, 毛伊敏, 彭新东. 基于犹豫模糊集的凝聚式层次聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3755-3763.
[11]	吴军, 欧阳艾嘉, 张琳. 基于影响度的统计显著序列模式挖掘算法[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2713-2721.
[12]	余顺坤, 闫泓序. 基于确定性因子的启发式属性值约简模型[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 469-474.
[13]	温雯, 梁方宇. 基于用户潜在状态及依赖关系学习的时序行为推荐[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3756-3762.
[14]	刘世泽, 秦艳君, 王晨星, 苏琳, 柯其学, 罗海勇, 孙艺, 王宝会. 基于深度残差长短记忆网络交通流量预测算法[J]. 计算机应用, 2021, 41(6): 1566-1572.
[15]	李旭娟, 皮建勇, 黄飞翔, 贾海朋. 基于自生成深度神经网络的4D航迹预测[J]. 计算机应用, 2021, 41(5): 1492-1499.

基于值差度量和聚类优化的K最近邻算法在银行客户行为预测中的应用

Application of KNN algorithm based on value difference metric and clustering optimization in bank customer behavior prediction

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics