Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3107-3113.DOI: 10.11772/j.issn.1001-9081.2022091454
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Jingtao ZHAO1,2, Zefang ZHAO1,2, Zhaojuan YUE1, Jun LI1,2()
Received:
2022-09-30
Revised:
2022-12-15
Accepted:
2023-01-05
Online:
2023-03-17
Published:
2023-10-10
Contact:
Jun LI
About author:
ZHAO Jingtao, born in 1998, M. S. candidate. His research interests include recommendation system, machine learning.Supported by:
通讯作者:
李俊
作者简介:
赵敬涛(1998—),男,山东聊城人,硕士研究生,主要研究方向:推荐系统、机器学习基金资助:
CLC Number:
Jingtao ZHAO, Zefang ZHAO, Zhaojuan YUE, Jun LI. TenrepNN:practice of new ensemble learning paradigm in enterprise self-discipline evaluation[J]. Journal of Computer Applications, 2023, 43(10): 3107-3113.
赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN:集成学习的新范式在企业自律性评价中的实践[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3107-3113.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022091454
属性 | 介绍 | 属性 | 介绍 |
---|---|---|---|
A1 | 用户投诉量 | A9 | 公司类型 |
A2 | 用户投诉回复率 | A10 | 人员规模 |
A3 | 上市情况 | A11 | 参保人数 |
A4 | 企业性质 | A12 | 行政许可量 |
A5 | 注册属地 | A13 | 司法案件量 |
A6 | 经营状态 | A14 | 行政处罚量 |
A7 | 成立日期 | A15 | 守信激励量 |
A8 | 注册资本 | A16 | 失信惩戒量 |
Tab. 1 Information of enterprise self-discipline evaluation dataset
属性 | 介绍 | 属性 | 介绍 |
---|---|---|---|
A1 | 用户投诉量 | A9 | 公司类型 |
A2 | 用户投诉回复率 | A10 | 人员规模 |
A3 | 上市情况 | A11 | 参保人数 |
A4 | 企业性质 | A12 | 行政许可量 |
A5 | 注册属地 | A13 | 司法案件量 |
A6 | 经营状态 | A14 | 行政处罚量 |
A7 | 成立日期 | A15 | 守信激励量 |
A8 | 注册资本 | A16 | 失信惩戒量 |
模型 | 误差 | 模型 | 误差 |
---|---|---|---|
LightGBM | 5.44 | LightGBMLarge | 5.93 |
CatBoost | 5.54 | NeuralNetTorch | 6.01 |
XGBoost | 5.60 | RandomForestMSE | 6.11 |
LightGBMXT | 5.83 | ExtraTreesMSE | 6.26 |
Tab. 2 Performance of different models in AutoGluon framework
模型 | 误差 | 模型 | 误差 |
---|---|---|---|
LightGBM | 5.44 | LightGBMLarge | 5.93 |
CatBoost | 5.54 | NeuralNetTorch | 6.01 |
XGBoost | 5.60 | RandomForestMSE | 6.11 |
LightGBMXT | 5.83 | ExtraTreesMSE | 6.26 |
企业自律性得分 | 等级 | 说明 |
---|---|---|
90~100 | AAA | 企业自律性较好,能够较好地把控网络 文化信息的质量,快速处理问题 |
80~89 | AA | |
70~79 | A | 企业自律性一般,能够处理网络文化中 出现的问题,但不能及时处理 |
60~69 | B | |
0~59 | C | 企业自律性较差,经营状况不佳,不能 处理网络文化中出现的问题 |
Tab. 3 Classification of enterprise self-discipline level
企业自律性得分 | 等级 | 说明 |
---|---|---|
90~100 | AAA | 企业自律性较好,能够较好地把控网络 文化信息的质量,快速处理问题 |
80~89 | AA | |
70~79 | A | 企业自律性一般,能够处理网络文化中 出现的问题,但不能及时处理 |
60~69 | B | |
0~59 | C | 企业自律性较差,经营状况不佳,不能 处理网络文化中出现的问题 |
模型 | 超参数设置 |
---|---|
CatBoost | 学习率为0.1,最大深度为3,损失函数为RMSE, 早停回数为100 |
LightGBM | 学习率为0.1,最大深度为5,最少叶节点样本数为15, 叶节点数为14,特征随机采样比例为0.95, 正则化权重为0.01,Bagging采样比例为0.6 |
XGBoost | 学习率为0.07,最大深度为3,Gamma值为0.5,最大叶子 节点数为5,L1正则化权重为0.5,L2正则化权重为0.3 |
Tab. 4 Hyperparameter setting of base models
模型 | 超参数设置 |
---|---|
CatBoost | 学习率为0.1,最大深度为3,损失函数为RMSE, 早停回数为100 |
LightGBM | 学习率为0.1,最大深度为5,最少叶节点样本数为15, 叶节点数为14,特征随机采样比例为0.95, 正则化权重为0.01,Bagging采样比例为0.6 |
XGBoost | 学习率为0.07,最大深度为3,Gamma值为0.5,最大叶子 节点数为5,L1正则化权重为0.5,L2正则化权重为0.3 |
超参数 | 值 |
---|---|
输入特征维度 | 15 |
隐藏层神经元个数 | 10 |
输出维度 | 15 |
批次大小 | 128 |
学习率 | 0.001 |
L2正则化系数 | 0.000 1 |
训练轮次 | 50 |
Tab. 5 Hyperparameter setting of residual neural network
超参数 | 值 |
---|---|
输入特征维度 | 15 |
隐藏层神经元个数 | 10 |
输出维度 | 15 |
批次大小 | 128 |
学习率 | 0.001 |
L2正则化系数 | 0.000 1 |
训练轮次 | 50 |
模型 | RMSE | 准确率/% |
---|---|---|
线性回归 | 2.553 | 93.06 |
岭回归 | 2.553 | 93.27 |
随机森林 | 2.446 | 92.65 |
神经网络 | 2.515 | 91.82 |
XGBoost | 2.400 | 93.48 |
LightGBM | 2.395 | 92.24 |
CatBoost | 2.363 | 93.69 |
Tab. 6 Performance of single model on dataset of enterprise self-discipline evaluation
模型 | RMSE | 准确率/% |
---|---|---|
线性回归 | 2.553 | 93.06 |
岭回归 | 2.553 | 93.27 |
随机森林 | 2.446 | 92.65 |
神经网络 | 2.515 | 91.82 |
XGBoost | 2.400 | 93.48 |
LightGBM | 2.395 | 92.24 |
CatBoost | 2.363 | 93.69 |
第2层模型 | RMSE | 准确率/% |
---|---|---|
线性回归 | 2.369 | 92.86 |
CatBoost | 2.371 | 93.79 |
XGBoost | 2.465 | 93.79 |
LightGBM | 2.413 | 91.82 |
传统神经网络 | 2.330 | 93.79 |
残差预测神经网络 | 2.266 | 94.51 |
Tab. 7 Results under different second level models
第2层模型 | RMSE | 准确率/% |
---|---|---|
线性回归 | 2.369 | 92.86 |
CatBoost | 2.371 | 93.79 |
XGBoost | 2.465 | 93.79 |
LightGBM | 2.413 | 91.82 |
传统神经网络 | 2.330 | 93.79 |
残差预测神经网络 | 2.266 | 94.51 |
模型 | RMSE | 准确率/% |
---|---|---|
AutoGluon | 2.357 | 93.79 |
TenrepNN | 2.266 | 94.51 |
Tab. 8 Comparison of the proposed model and automatic machine learning framework AutoGluon
模型 | RMSE | 准确率/% |
---|---|---|
AutoGluon | 2.357 | 93.79 |
TenrepNN | 2.266 | 94.51 |
模型 | RMSE | 准确率/% |
---|---|---|
去掉残差预测神经网络的模型 | 2.319 | 93.79 |
TenrepNN | 2.266 | 94.51 |
Tab. 9 Ablation experimental results of residual prediction neural network
模型 | RMSE | 准确率/% |
---|---|---|
去掉残差预测神经网络的模型 | 2.319 | 93.79 |
TenrepNN | 2.266 | 94.51 |
模型 | RMSE | 准确率/% |
---|---|---|
AdaBoost | 2.843 | 85.40 |
Stacking | 2.832 | 91.20 |
Bagging | 2.337 | 94.10 |
GBDT | 2.287 | 94.72 |
TenrepNN | 2.266 | 94.51 |
Tab. 10 Comparison of the proposed model and other ensemble learning methods
模型 | RMSE | 准确率/% |
---|---|---|
AdaBoost | 2.843 | 85.40 |
Stacking | 2.832 | 91.20 |
Bagging | 2.337 | 94.10 |
GBDT | 2.287 | 94.72 |
TenrepNN | 2.266 | 94.51 |
1 | 中国互联网络信息中心. 第50 次中国互联网络发展状况统计报告[R/OL]. (2022-08-31) [2022-04-22].. |
China Internet Network Information Center. The 50th statistical reports on China’s internet development[R/OL]. (2022-08-31) [2022-04-22].. | |
2 | 卢加元,张晓东. 我国网络文化市场分级监管问题研究[J]. 江苏商论, 2021(7):23-26. 10.3969/j.issn.1009-0061.2021.07.007 |
LU J Y, ZHANG X D. Research on classified supervision of China’s internet culture market[J]. Jiangsu Commercial Forum, 2021(7): 23-26. 10.3969/j.issn.1009-0061.2021.07.007 | |
3 | 张春霞,张讲社. 选择性集成学习算法综述[J]. 计算机学报, 2011, 34(8):1399-1410. 10.3724/sp.j.1016.2011.01399 |
ZHANG C X, ZHANG J S. A survey of selective ensemble learning algorithms[J]. Chinese Journal of Computers, 2011, 34(8):1399-1410. 10.3724/sp.j.1016.2011.01399 | |
4 | GIJSBERS P, LeDELL R, THOMAS J, et al. An open source AutoML benchmark[EB/OL]. (2019-07-01) [2022-04-22].. |
5 | 张钰. 中小企业财务预警与信用评分研究[J]. 经济研究导刊, 2021(30):63-65. 10.3969/j.issn.1673-291X.2021.30.021 |
ZHANG Y. Research on financial early-warning and credit scoring of small and medium-sized enterprises[J]. Economic Research Guide, 2021(30):63-65. 10.3969/j.issn.1673-291X.2021.30.021 | |
6 | 邓大松,赵玉龙. 我国商业银行小微企业申请评分卡构建及验证研究[J]. 投资研究, 2017, 36(5):149-159. |
DENG D S, ZHAO Y L. Research on the application card on small enterprise in the commercial bank[J]. Review of Investment Studies, 2017, 36(5):149-159. | |
7 | 陶爱元,吴俊. 基于DEA方法的我国上市中小企业信用评分研究[J]. 征信, 2014, 32(6):52-56. 10.3969/j.issn.1674-747X.2014.06.013 |
TAO A Y, WU J. Study on credit rating for China’s listed SMEs based on DEA method[J]. Credit Reference, 2014, 32(6):52-56. 10.3969/j.issn.1674-747X.2014.06.013 | |
8 | 卢悦冉,芮英健,袁芳,等. 基于评分卡模型下中小微企业的信贷决策[J]. 中国市场, 2021(27):53-54. |
LU Y R, RUI Y J, YUAN F, et al. Credit decision of SMEs based on score card model[J]. China Market, 2021(27):53-54. | |
9 | 秦晓琳. 中小企业借贷信用分析系统的设计与实现[D]. 北京:北京交通大学, 2019:49-56. |
QIN X L. Design and implementation of credit analysis system for SME lending[D]. Beijing: Beijing Jiaotong University, 2019:49-56. | |
10 | 姜正申,刘宏志,付彬,等. 集成学习的泛化误差和AUC分解理论及其在权重优化中的应用[J]. 计算机学报, 2019, 42(1):1-15. 10.11897/SP.J.1016.2019.00001 |
JIANG Z S, LIU H Z, FU B, et al. Decomposition theories of generalization error and AUC in ensemble learning with application in weight optimization[J]. Chinese Journal of Computers, 2019, 42(1):1-15. 10.11897/SP.J.1016.2019.00001 | |
11 | ZHOU Z H. Ensemble Methods: Foundations and Algorithms[M]. Boca Raton, FL: CRC Press, 2012:47-50. |
12 | WOLPERT D H. Stacked generalization[J]. Neural Networks, 1992, 5(2): 241-259. 10.1016/s0893-6080(05)80023-1 |
13 | 李珩,朱靖波,姚天顺. 基于Stacking算法的组合分类器及其应用于中文组块分析[J]. 计算机研究与发展, 2005, 42(5):844-848. 10.1360/crad20050519 |
LI H, ZHU J B, YAO T S. Combined multiple classifiers based on a Stacking algorithm and their application to Chinese text chunking[J]. Journal of Computer Research and Development, 2005, 42(5):844-848. 10.1360/crad20050519 | |
14 | RASCHKA S. MLxtend: providing machine learning and data science utilities and extensions to Python’s scientific computing stack[J]. The Journal of Open Source Software, 2018, 3(24): No.638. 10.21105/joss.00638 |
15 | DŽEROSKI S, ŽENKO B. Is combining classifiers with Stacking better than selecting the best one?[J]. Machine Learning, 2004, 54(3): 255-273. 10.1023/b:mach.0000015881.36452.6e |
16 | BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140. 10.1007/bf00058655 |
17 | AGARWAL S, CHOWDARY C R. A-Stacking and A-Bagging: adaptive versions of ensemble learning algorithms for spoof fingerprint detection[J]. Expert Systems with Applications, 2020, 146: No.113160. 10.1016/j.eswa.2019.113160 |
18 | SAGI O, ROKACH L. Ensemble learning: a survey[J]. WIREs Data Mining and Knowledge Discovery. 2018, 8(4): No.e1249. 10.1002/widm.1249 |
19 | ERICKSON N, MUELLER J, SHIRKOV A, et al. AutoGluon-Tabular: robust and accurate AutoML for structured data[EB/OL]. (2020-03-13) [2022-05-14].. |
20 | DOROGUSH A V, ERSHOV V, GULIN A. CatBoost: gradient boosting with categorical features support[EB/OL]. (2018-08-24) [2022-03-11].. |
21 | KE G, MENG Q, FINLEY T. LightGBM: a highly efficient gradient boosting decision tree[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 3149-3157. 10.18653/v1/w16-13 |
22 | CHEN T, GUESTRIN C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016:785-794. 10.1145/2939672.2939785 |
23 | KOMER B, BERGSTRA J, ELIASMITH C. Hyperopt-Sklearn: automatic hyperparameter configuration for scikit-learn[C/OL]// Proceedings of the 13th Python in Science Conference [2022-04-22].https://conference.scipy.org/proceedings/scipy2014/pdfs/komer.pdf#:~:text=With%20this%20paper%20we%20introduce%20Hyperopt-Sklearn%3A%20a%20project,Scikit-Learn%20components%2C%20including%20preprocessing%20and%20classifi-%20cation%20modules. 10.25080/majora-14bd3278-006 |
24 | ZHANG L, SUGANTHAN P N. Benchmarking ensemble classifiers with novel co-trained kernel ridge regression and random vector functional link ensembles [Research Frontier][J]. IEEE Computational Intelligence Magazine, 2017, 12(4):61-72. 10.1109/mci.2017.2742867 |
25 | LIU W, WANG Z, LIU X, et al. A survey of deep neural network architectures and their applications[J]. Neurocomputing, 2017, 234: 11-26. 10.1016/j.neucom.2016.12.038 |
26 | DRUCKER H. Improving regressors using boosting techniques[C]// Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 1997: 107-115. |
27 | FRIEDMAN J H. Greedy function approximation: a gradient boosting machine[J]. The Annals of Statistics, 2001, 29(5): 1189-1232. 10.1214/aos/1013203451 |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||