Credit risk prediction model based on borderline adaptive SMOTE and Focal Loss improved LightGBM

doi:10.11772/j.issn.1001-9081.2021050810

Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (7): 2256-2264.DOI: 10.11772/j.issn.1001-9081.2021050810

• Frontier and comprehensive applications • Previous Articles Next Articles

Credit risk prediction model based on borderline adaptive SMOTE and Focal Loss improved LightGBM

Hailong CHEN(), Chang YANG, Mei DU, Yingyu ZHANG

College of Computer Science and Technology，Harbin University of Science and Technology，Harbin Heilongjiang 150080，China

Received:2021-05-18 Revised:2021-09-29 Accepted:2021-10-12 Online:2022-07-15 Published:2022-07-10
Contact: Hailong CHEN
About author:YANG Chang， born in 1997， M. S. candidate. Her research interests include machine learning.
DU Mei， born in 1996， M. S. candidate. Her research interests include machine learning.
ZHANG Yingyu， born in 1996， M. S. candidate. Her research interests include machine learning.
Supported by:
National Natural Science Foundation of China(61772160);Special Research Program of Scientific and Technological Innovation for Young Scientists of Harbin(2017RAQXJ045)

基于边界自适应SMOTE和Focal Loss函数改进LightGBM的信用风险预测模型

陈海龙(), 杨畅, 杜梅, 张颖宇

哈尔滨理工大学计算机科学与技术学院，哈尔滨 150080

通讯作者: 陈海龙
作者简介:杨畅（1997—），女，黑龙江绥化人，硕士研究生，主要研究方向：机器学习
杜梅（1996—），女，山东济南人，硕士研究生，主要研究方向：机器学习
张颖宇（1996—），女，河北唐山人，硕士研究生，主要研究方向：机器学习。
基金资助:
国家自然科学基金资助项目(61772160);哈尔滨市科技创新人才研究专项(2017RAQXJ045)

Abstract

Abstract:

Aiming at the problem that the imbalance of datasets in credit risk assessment affects the prediction effect of the model， a credit risk prediction model based on Borderline Adaptive Synthetic Minority Oversampling TEchnique （BA-SMOTE） and Focal Loss-Light Gradient Boosting Machine （FLLightGBM） was proposed. Firstly， on the basis of Borderline Synthetic Minority Oversampling TEchnique （Borderline-SMOTE）， the adaptive idea and new interpolation method were introduced， so that different numbers of new samples were generated for each minority sample at the border， and the positions of the new samples were closer to the original minority sample， thereby balancing the dataset. Secondly， the Focal Loss function was used to improve the loss function of LightGBM （Light Gradient Boosting Machine） algorithm， and the improved algorithm was used to train a new dataset to obtain the final BA-SMOTE-FLLightGBM model constructed by BA-SMOTE method and FLLightGBM algorithm. Finally， on Lending Club dataset， the credit risk prediction was performed. Experimental results show that compared with other imbalanced classification algorithms RUSBoost （Random Under-Sampling with adaBoost）， CUSBoost （Cluster-based Under-Sampling with adaBoost）， KSMOTE-AdaBoost （K-means clustering SMOTE with AdaBoost）， and AK-SMOTE-Catboost （AllKnn-SMOTE-Catboost）， the constructed model has a significant improvement on two evaluation indicators G-mean and AUC （Area Under Curve） with 9.0%-31.3% and 5.0%-14.1% respectively. The above results verify that the proposed model has a better default prediction effect in credit risk assessment.

Key words: credit risk, imbalanced data, oversampling, LightGBM （Light Gradient Boosting Machine）, Focal Loss

摘要：

针对信用风险评估中数据集不平衡影响模型预测效果的问题，提出一种基于边界自适应合成少数类过采样方法（BA-SMOTE）和利用Focal Loss函数改进LightGBM损失函数的算法（FLLightGBM）相结合的信用风险预测模型。首先，在边界合成少数类过采样（Borderline-SMOTE）的基础上，引入自适应思想和新的插值方式，使每个处于边界的少数类样本生成不同数量的新样本，并且新样本的位置更靠近原少数类样本，以此来平衡数据集；其次，利用Focal Loss函数来改进LightGBM算法的损失函数，并以改进的算法训练新的数据集以得到最终结合BA-SMOTE方法和FLLightGBM算法建立的BA-SMOTE-FLLightGBM模型；最后，在Lending Club数据集上进行信用风险预测。实验结果表明，与其他不平衡分类算法RUSBoost、CUSBoost、KSMOTE-AdaBoost和AK-SMOTE-Catboost相比，所建立的模型在G-mean和AUC两个指标上都有明显的提升，提升了9.0%~31.3%和5.0%~14.1%。以上结果验证了所提出的模型在信用风险评估中具有更好的违约预测效果。

关键词: 信用风险, 不平衡数据, 过采样, LightGBM, Focal Loss

CLC Number:

TP391.9

Hailong CHEN, Chang YANG, Mei DU, Yingyu ZHANG. Credit risk prediction model based on borderline adaptive SMOTE and Focal Loss improved LightGBM[J]. Journal of Computer Applications, 2022, 42(7): 2256-2264.

陈海龙, 杨畅, 杜梅, 张颖宇. 基于边界自适应SMOTE和Focal Loss函数改进LightGBM的信用风险预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2256-2264.

Figures/Tables 14

References 30

1	马晓君，沙靖岚，牛雪琪. 基于LightGBM算法的P2P项目信用评级模型的设计及应用［J］. 数量经济技术经济研究， 2018， 35（5）：144-160. 10.1016/j.elerap.2018.08.002
	MA X J， SHA J L， NIU X Q. An empirical study on the credit rating of P2P projects based on LightGBM algorithm［J］. The Journal of Quantitative and Technical Economics， 2018， 35（5）： 144-160. 10.1016/j.elerap.2018.08.002
2	谢陈昕. P2P网贷平台借款人信用风险评估模型适应性研究［J］. 武汉金融， 2019（3）：23-29. 10.3969/j.issn.1009-3540.2019.03.005
	XIE C X. Research on adaptability of credit risk assessment model for borrowers of P2P online lending platform［J］. Wuhan Finance， 2019（3）： 23-29. 10.3969/j.issn.1009-3540.2019.03.005
3	COSTA E SILVA E， LOPES I C， CORREIA A， et al. A logistic regression model for consumer default risk［J］. Journal of Applied Statistics， 2020， 47（13/14/15）： 2879-2894. 10.1080/02664763.2020.1759030
4	BEKHET H A， ELETTER S F K. Credit risk assessment model for Jordanian commercial banks： neural scoring approach［J］. Review of Development Finance， 2014， 4（1）： 20-28. 10.1016/j.rdf.2014.03.002
5	WANG T， LI J C. An improved support vector machine and its application in P2P lending personal credit scoring［J］. IOP Conference Series： Materials Science and Engineering， 2019， 490（6）： No.062041. 10.1088/1757-899x/490/6/062041
6	邵良杉，周玉. 一种改进过采样算法在类别不平衡信用评分中的应用［J］. 计算机应用研究， 2019， 36（6）：1683-1687.
	SHAO L S， ZHOU Y. Application of improved oversampling algorithm in class-imbalance credit scoring［J］. Application Research of Computers， 2019， 36（6）： 1683-1687.
7	GARCÍA V， SÁNCHEZ J S， MOLLINEDA R A. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance［J］. Knowledge-Based Systems， 2012， 25（1）： 13-21. 10.1016/j.knosys.2011.06.013
8	陈启伟，王伟，马迪，等. 基于Ext-GBDT集成的类别不平衡信用评分模型［J］. 计算机应用研究， 2018， 35（2）：421-427. 10.3969/j.issn.1001-3695.2018.02.022
	CHEN Q W， WANG W， MA D， et al. Class-imbalance credit scoring using Ext-GBDT ensemble［J］. Application Research of Computers， 2018， 35（2）： 421-427. 10.3969/j.issn.1001-3695.2018.02.022
9	CHAWLA N V， BOWYER K W， HALL L O， et al. SMOTE： Synthetic minority over-sampling technique［J］. Journal of Artificial Intelligence Research， 2002， 16： 321-357. 10.1613/jair.953
10	NIU A W， CAI B Q， CAI S S， et al. Big data analytics for complex credit risk assessment of network lending based on SMOTE algorithm［J］ Complexity， 2020， 2020： No.8563030. 10.1155/2020/8563030
11	KHEMAKHEM S， SAID F BEN， BOUJELBENE Y. Credit risk assessment for unbalanced datasets based on data mining， artificial neural network and support vector machines［J］. Journal of Modelling in Management， 2018， 13（4）： 932-951. 10.1108/jm2-01-2017-0002
12	王超学，张涛，马春森. 面向不平衡数据集的改进型SMOTE算法［J］. 计算机科学与探索， 2014， 8（6）：727-734.
	WANG C X， ZHANG T， MA C S. Improved SMOTE algorithm for imbalanced datasets［J］. Journal of Frontiers of Computer Science and Technology， 2014， 8（6）： 727-734.
13	HAN H， WANG W Y， MAO B H. Border-line-SMOTE： a new over-sampling method in imbalanced data sets learning［C］// Proceedings of the 2005 International Conference on Intelligent Computing， LNCS 3644. Berlin： Springer， 2005： 878-887.
14	NAKAMURA M， KAJIWARA Y， OTSUKA A， et al. LVQ-SMOTE — learning vector quantization based synthetic minority over-sampling technique for biomedical data［J］. BioData Mining， 2013， 6： No.16. 10.1186/1756-0381-6-16
15	田臣，周丽娟. 基于带多数类权重的少数类过采样技术和随机森林的信用评估方法［J］. 计算机应用， 2019， 39（6）：1707-1712. 10.11772/j.issn.1001-9081.2018102180
	TIAN C， ZHOU L J. Credit assessment method based on majority weight minority oversampling technique and random forest［J］. Journal of Computer Applications， 2019， 39（6）： 1707-1712. 10.11772/j.issn.1001-9081.2018102180
16	BARUA S， ISLAM M M， YAO X， et al. MWMOTE — majority weighted minority oversampling technique for imbalanced data set learning［J］. IEEE Transactions on Knowledge and Data Engineering， 2014， 26（2）：405-425. 10.1109/tkde.2012.232
17	HE H B， BAI Y， GARCIA E A， et al. ADASYN： adaptive synthetic sampling approach for imbalanced learning［C］// Proceedings of the 2008 IEEE International Joint Conference on Neural Networks （IEEE World Congress on Computational Intelligence）. Piscataway： IEEE， 2008： 1322-1328. 10.1109/ijcnn.2008.4633969
18	赵楠，张小芳，张利军. 不平衡数据分类研究综述［J］. 计算机科学， 2018， 45（6A）：22-27， 57. 10.11896/j.issn.1002-137X.2018.Z6.004
	ZHAO N， ZHANG X F， ZHANG L J. Overview of imbalanced data classification［J］. Computer Science， 2018， 45（6A）：22-27， 57. 10.11896/j.issn.1002-137X.2018.Z6.004
19	吴雨茜，王俊丽，杨丽，等. 代价敏感深度学习方法研究综述［J］. 计算机科学， 2019， 46（5）：1-12. 10.11896/j.issn.1002-137X.2019.05.001
	WU Y X， WANG J L， YANG L， et al. Survey on cost-sensitive deep learning methods［J］. Computer Science， 2019， 46（5）：1-12. 10.11896/j.issn.1002-137X.2019.05.001
20	陈白强，盛静文，江开忠. 基于损失函数的代价敏感集成算法［J］. 计算机应用， 2020， 40（S2）：60-65.
	CHEN B Q， SHENG J W， JIANG K Z. Cost-sensitive ensemble algorithm based on loss function［J］. Journal of Computer Applications， 2020， 40（S2）：60-65.
21	王俊红，闫家荣. 基于欠采样和代价敏感的不平衡数据分类算法［J］. 计算机应用， 2021， 41（1）：48-52.
	WANG J H， YAN J R. Classification algorithm based on undersampling and cost-sensitiveness for unbalanced data［J］. Journal of Computer Applications， 2021， 41（1）：48-52.
22	WANG C， DENG C Y， WANG S Z. Imbalance-XGBoost： leveraging weighted and focal losses for binary label-imbalanced classification with XGBoost［J］. Pattern Recognition Letters， 2020， 136： 190-197. 10.1016/j.patrec.2020.05.035
23	SEIFFERT C， KHOSHGOFTAAR T M， VAN HULSE J， et al. RUSBoost： a hybrid approach to alleviating class imbalance［J］. IEEE Transactions on Systems， Man， and Cybernetics — Part A： Systems and Humans， 2010， 40（1）：185-197. 10.1109/tsmca.2009.2029559
24	RAYHAN F， AHMED S， MAHBUB A， et al. CUSBoost： cluster-based under-sampling with boosting for imbalanced classification［C］// Proceedings of the 2nd International Conference on Computational Systems and Information Technology for Sustainable Solutions. Piscataway： IEEE， 2017： 1-5. 10.1109/csitss.2017.8447534
25	王忠震，黄勃，方志军，等. 改进SMOTE的不平衡数据集成分类算法［J］. 计算机应用， 2019， 39（9）：2591-2596. 10.11772/j.issn.1001-9081.2019030531
	WANG Z Z， HUANG B， FANG Z J， et al. Improved SMOTE unbalanced data integration classification algorithm［J］. Journal of Computer Applications， 2019， 39（9）：2591-2596. 10.11772/j.issn.1001-9081.2019030531
26	张德鑫，雒腾，曾志勇. 基于改进的SMOTE采样Catboost分类算法［J］. 信息通信， 2020（1）：57-60. 10.3969/j.issn.1673-1131.2020.01.026
	ZHANG D X， LUO T， ZENG Z Y. Catboost classification algorithm based on improved SMOTE sampling［J］. Information & Communications， 2020（1）：57-60. 10.3969/j.issn.1673-1131.2020.01.026
27	KE G L， MENG Q， FINLEY T， et al. LightGBM： a highly efficient gradient boosting decision tree［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 3149-3157.
28	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2020， 42（2）： 318-327. 10.1109/tpami.2018.2858826
29	宋玲玲，王时绘，杨超，等. 改进的XGBoost在不平衡数据处理中的应用研究［J］. 计算机科学， 2020， 47（6）：98-103. 10.11896/jsjkx.191200138
	SONG L L， WANG S H， YANG C， et al. Application research of improved XGBoost in unbalanced data processing［J］. Computer Science， 2020， 47（6）：98-103. 10.11896/jsjkx.191200138
30	姚登举，杨静，詹晓娟. 基于随机森林的特征选择算法［J］. 吉林大学学报（工学版）， 2014， 44（1）：137-141. 10.1504/ijdmb.2015.070852
	YAO D J， YANG J， ZHAN X J. Feature selection algorithm based on random forest［J］. Journal of Jilin University （Engineering and Technology Edition）， 2014， 44（1）： 137-141. 10.1504/ijdmb.2015.070852

序号	特征	重要性	序号	特征	重要性	序号	特征	重要性
1	loan_amnt	0.131 9	7	verification_status	0.072 0	13	pub_rec	0.022 6
2	term	0.100 8	8	dti	0.064 3	14	revol_bal	0.019 4
3	int_rate	0.091 5	9	deling_2yrs	0.061 2	15	total_acc	0.019 3
4	grade	0.081 4	10	inq_last_6mths	0.058 1	16	total_bc_limit	0.013 4
5	home_ownership	0.080 3	11	acc_open_past_24mths	0.051 1	17	bc_util	0.012 6
6	annual_inc	0.077 9	12	open_acc	0.023 8	18	out_prncp_inv	0.009 1

序号	特征	重要性	序号	特征	重要性	序号	特征	重要性
1	loan_amnt	0.131 9	7	verification_status	0.072 0	13	pub_rec	0.022 6
2	term	0.100 8	8	dti	0.064 3	14	revol_bal	0.019 4
3	int_rate	0.091 5	9	deling_2yrs	0.061 2	15	total_acc	0.019 3
4	grade	0.081 4	10	inq_last_6mths	0.058 1	16	total_bc_limit	0.013 4
5	home_ownership	0.080 3	11	acc_open_past_24mths	0.051 1	17	bc_util	0.012 6
6	annual_inc	0.077 9	12	open_acc	0.023 8	18	out_prncp_inv	0.009 1

实际类别	预测类别
实际类别	正类	负类
正类	TP	FN
负类	FP	TN

实际类别	预测类别
实际类别	正类	负类
正类	TP	FN
负类	FP	TN

算法	参数取值（b，ε）	F₁-score	G-mean	AUC	KS
LightGBM	（0.5，0.3）	0.971 5	0.681 2	0.825 1	0.515 0
	（0.5，0.5）	0.972 9	0.688 3	0.826 0	0.506 7
	（1，0.3）	0.976 5	0.693 1	0.830 9	0.519 4
	（1，0.5）	0.9773	0.7226	0.8360	0.5409
XGBoost	（0.5，0.3）	0.971 8	0.664 5	0.812 4	0.537 6
	（0.5，0.5）	0.970 7	0.658 4	0.817 4	0.518 5
	（1，0.3）	0.974 6	0.674 4	0.826 1	0.538 8
	（1，0.5）	0.9765	0.6916	0.8268	0.5401
GBDT	（0.5，0.3）	0.960 2	0.630 7	0.789 2	0.448 3
	（0.5，0.5）	0.9710	0.668 3	0.790 1	0.459 7
	（1，0.3）	0.970 2	0.649 9	0.784 0	0.453 7
	（1，0.5）	0.970 5	0.6704	0.7903	0.4618
RF	（0.5，0.3）	0.970 1	0.657 2	0.788 5	0.466 9
	（0.5，0.5）	0.961 2	0.650 1	0.786 1	0.472 7
	（1，0.3）	0.966 5	0.648 7	0.780 2	0.458 8
	（1，0.5）	0.9709	0.6786	0.8077	0.4883
LR	（0.5，0.3）	0.829 8	0.631 0	0.742 0	0.375 5
	（0.5，0.5）	0.831 5	0.644 1	0.762 7	0.4156
	（1，0.3）	0.824 3	0.652 8	0.762 4	0.409 6
	（1，0.5）	0.8334	0.6546	0.7645	0.413 3

Credit risk prediction model based on borderline adaptive SMOTE and Focal Loss improved LightGBM

基于边界自适应SMOTE和Focal Loss函数改进LightGBM的信用风险预测模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 30

Related Articles 15

Recommended Articles

Metrics

算法	F₁-score	G-mean	AUC	KS
LightGBM	0.952 4	0.601 7	0.778 5	0.437 2
BA-SMOTE-LightGBM	0.977 3	0.722 6	0.836 0	0.540 9
BA-SMOTE-FLLightGBM	0.978 2	0.783 2	0.851 9	0.565 7

算法	F₁-score	G-mean	AUC	KS
SMOTE-FLLightGBM	0.969 1	0.641 5	0.794 1	0.462 1
Borderline-SMOTE-FLLightGBM	0.976 4	0.698 9	0.826 3	0.484 5
ADASYN-FLLightGBM	0.977 5	0.696 3	0.812 4	0.475 8
BA-SMOTE-FLLightGBM	0.978 2	0.783 2	0.851 9	0.565 7
SMOTE-XGBoost	0.953 9	0.607 4	0.785 3	0.452 4
Borderline-SMOTE-XGBoost	0.958 1	0.629 8	0.795 4	0.466 5
ADASYN-XGBoost	0.962 9	0.632 6	0.786 7	0.457 5
BA-SMOTE-XGBoost	0.976 5	0.691 6	0.826 8	0.540 1
SMOTE-GBDT	0.956 6	0.605 9	0.782 2	0.458 3
Borderline-SMOTE-GBDT	0.960 1	0.618 2	0.784 5	0.454 7
ADASYN-GBDT	0.968 2	0.602 4	0.775 1	0.446 3
BA-SMOTE-GBDT	0.970 5	0.670 4	0.790 3	0.461 8
SMOTE-RF	0.968 2	0.596 6	0.778 6	0.463 3
Borderline-SMOTE-RF	0.967 9	0.622 7	0.793 2	0.470 8
ADASYN-RF	0.969 7	0.616 1	0.787 4	0.459 2
BA-SMOTE-RF	0.970 9	0.678 6	0.807 7	0.488 3
SMOTE-LR	0.821 6	0.559 6	0.739 5	0.370 4
Borderline-SMOTE-LR	0.856 5	0.603 5	0.747 3	0.375 6
ADASYN-LR	0.802 7	0.624 9	0.749 9	0.377 8
BA-SMOTE-LR	0.833 4	0.654 6	0.764 5	0.413 3

算法	F₁-score	G-mean	AUC	KS
RUSBoost	0.853 7	0.596 5	0.746 5	0.378 7
CUSBoost	0.935 2	0.634 1	0.781 5	0.445 2
KSMOTE-AdaBoost	0.954 9	0.705 6	0.792 3	0.453 6
AK-SMOTE-Catboost	0.961 7	0.718 5	0.811 5	0.473 1
BA-SMOTE-FLLightGBM	0.978 2	0.783 2	0.851 9	0.565 7

算法	F₁-score	G-mean	AUC	KS
RUSBoost	0.766 8	0.579 4	0.742 8	0.376 4
CUSBoost	0.781 2	0.597 1	0.747 3	0.395 0
KSMOTE-AdaBoost	0.825 4	0.620 1	0.759 5	0.398 4
AK-SMOTE-Catboost	0.834 4	0.632 6	0.761 3	0.412 5
BA-SMOTE-FLLightGBM	0.856 0	0.657 8	0.788 4	0.447 3

[1]	Qiangkui LENG, Xuezi SUN, Xiangfu MENG. Oversampling method for imbalanced data based on sample potential and noise evolution [J]. Journal of Computer Applications, 2024, 44(8): 2466-2475.
[2]	Qianhui LU, Yu ZHANG, Mengling WANG, Tingwei WU, Yuzhong SHAN. Classification model of nuclear power equipment quality text based on improved recurrent pooling network [J]. Journal of Computer Applications, 2024, 44(7): 2034-2040.
[3]	Mingzhu LEI, Hao WANG, Rong JIA, Lin BAI, Xiaoying PAN. Oversampling algorithm based on synthesizing minority class samples using relationship between features [J]. Journal of Computer Applications, 2024, 44(5): 1428-1436.
[4]	Xiang GUO, Wengang JIANG, Yuhang WANG. Encrypted traffic classification method based on improved Inception-ResNet [J]. Journal of Computer Applications, 2023, 43(8): 2471-2476.
[5]	Lin SUN, Jinxu HUANG, Jiucheng XU. Feature selection for imbalanced data based on neighborhood tolerance mutual information and whale optimization algorithm [J]. Journal of Computer Applications, 2023, 43(6): 1842-1854.
[6]	Dongliang MU, Meng HAN, Ang LI, Shujuan LIU, Zhihui GAO. Overview of classification methods for complex data streams with concept drift [J]. Journal of Computer Applications, 2023, 43(6): 1664-1675.
[7]	Yi JIANG, Shuping WU, Kun HU, Linbo LONG. Imbalanced data classification method based on Lasso and constructive covering algorithm [J]. Journal of Computer Applications, 2023, 43(4): 1086-1093.
[8]	Yaru HAN, Lianshan YAN, Tao YAO. Deep hashing retrieval algorithm based on meta-learning [J]. Journal of Computer Applications, 2022, 42(7): 2015-2021.
[9]	Xuewen LIU, Jikui WANG, Zhengguo YANG, Qiang LI, Jihai YI, Bing LI, Feiping NIE. Imbalanced data classification algorithm based on ball cluster partitioning and undersampling with density peak optimization [J]. Journal of Computer Applications, 2022, 42(5): 1455-1463.
[10]	Yiheng LI, Chenxi DU, Yanyan YANG, Xiangyu LI. Feature selection algorithm for imbalanced data based on pseudo-label consistency [J]. Journal of Computer Applications, 2022, 42(2): 475-484.
[11]	Yu LU, Lingyun ZHAO, Binwen BAI, Zhen JIANG. Imbalanced classification algorithm based on improved semi-supervised clustering [J]. Journal of Computer Applications, 2022, 42(12): 3750-3755.
[12]	XIAO Zhenyuan, WANG Yihan, LUO Jianqiao, XIONG Ying, LI Bailin. RefineDet based on subsection weighted loss function [J]. Journal of Computer Applications, 2021, 41(7): 1928-1932.
[13]	WANG Yao, SUN Guozi. Oversampling method for intrusion detection based on clustering and instance hardness [J]. Journal of Computer Applications, 2021, 41(6): 1709-1714.
[14]	YANG Xian, ZHAO Jisheng, QIANG Baohua, MI Luzhong, PENG Bo, TANG Chenghua, LI Baolian. Wind turbine fault sampling algorithm based on improved BSMOTE and sequential characteristics [J]. Journal of Computer Applications, 2021, 41(6): 1673-1678.
[15]	Chuang GAO, Mian TANG, Liang ZHAO. B-cell epitope prediction model with overlapping subgraph mining based on L-Metric [J]. Journal of Computer Applications, 2021, 41(12): 3702-3706.