Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3107-3113.DOI: 10.11772/j.issn.1001-9081.2022091454
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
					
						                                                                                                                                                                                                                                                    Jingtao ZHAO1,2, Zefang ZHAO1,2, Zhaojuan YUE1, Jun LI1,2( )
)
												  
						
						
						
					
				
Received:2022-09-30
															
							
																	Revised:2022-12-15
															
							
																	Accepted:2023-01-05
															
							
							
																	Online:2023-03-17
															
							
																	Published:2023-10-10
															
							
						Contact:
								Jun LI   
													About author:ZHAO Jingtao, born in 1998, M. S. candidate. His research interests include recommendation system, machine learning.Supported by:通讯作者:
					李俊
							作者简介:赵敬涛(1998—),男,山东聊城人,硕士研究生,主要研究方向:推荐系统、机器学习基金资助:CLC Number:
Jingtao ZHAO, Zefang ZHAO, Zhaojuan YUE, Jun LI. TenrepNN:practice of new ensemble learning paradigm in enterprise self-discipline evaluation[J]. Journal of Computer Applications, 2023, 43(10): 3107-3113.
赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN:集成学习的新范式在企业自律性评价中的实践[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3107-3113.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022091454
| 属性 | 介绍 | 属性 | 介绍 | 
|---|---|---|---|
| A1 | 用户投诉量 | A9 | 公司类型 | 
| A2 | 用户投诉回复率 | A10 | 人员规模 | 
| A3 | 上市情况 | A11 | 参保人数 | 
| A4 | 企业性质 | A12 | 行政许可量 | 
| A5 | 注册属地 | A13 | 司法案件量 | 
| A6 | 经营状态 | A14 | 行政处罚量 | 
| A7 | 成立日期 | A15 | 守信激励量 | 
| A8 | 注册资本 | A16 | 失信惩戒量 | 
Tab. 1 Information of enterprise self-discipline evaluation dataset
| 属性 | 介绍 | 属性 | 介绍 | 
|---|---|---|---|
| A1 | 用户投诉量 | A9 | 公司类型 | 
| A2 | 用户投诉回复率 | A10 | 人员规模 | 
| A3 | 上市情况 | A11 | 参保人数 | 
| A4 | 企业性质 | A12 | 行政许可量 | 
| A5 | 注册属地 | A13 | 司法案件量 | 
| A6 | 经营状态 | A14 | 行政处罚量 | 
| A7 | 成立日期 | A15 | 守信激励量 | 
| A8 | 注册资本 | A16 | 失信惩戒量 | 
| 模型 | 误差 | 模型 | 误差 | 
|---|---|---|---|
| LightGBM | 5.44 | LightGBMLarge | 5.93 | 
| CatBoost | 5.54 | NeuralNetTorch | 6.01 | 
| XGBoost | 5.60 | RandomForestMSE | 6.11 | 
| LightGBMXT | 5.83 | ExtraTreesMSE | 6.26 | 
Tab. 2 Performance of different models in AutoGluon framework
| 模型 | 误差 | 模型 | 误差 | 
|---|---|---|---|
| LightGBM | 5.44 | LightGBMLarge | 5.93 | 
| CatBoost | 5.54 | NeuralNetTorch | 6.01 | 
| XGBoost | 5.60 | RandomForestMSE | 6.11 | 
| LightGBMXT | 5.83 | ExtraTreesMSE | 6.26 | 
| 企业自律性得分 | 等级 | 说明 | 
|---|---|---|
| 90~100 | AAA | 企业自律性较好,能够较好地把控网络 文化信息的质量,快速处理问题 | 
| 80~89 | AA | |
| 70~79 | A | 企业自律性一般,能够处理网络文化中 出现的问题,但不能及时处理 | 
| 60~69 | B | |
| 0~59 | C | 企业自律性较差,经营状况不佳,不能 处理网络文化中出现的问题 | 
Tab. 3 Classification of enterprise self-discipline level
| 企业自律性得分 | 等级 | 说明 | 
|---|---|---|
| 90~100 | AAA | 企业自律性较好,能够较好地把控网络 文化信息的质量,快速处理问题 | 
| 80~89 | AA | |
| 70~79 | A | 企业自律性一般,能够处理网络文化中 出现的问题,但不能及时处理 | 
| 60~69 | B | |
| 0~59 | C | 企业自律性较差,经营状况不佳,不能 处理网络文化中出现的问题 | 
| 模型 | 超参数设置 | 
|---|---|
| CatBoost | 学习率为0.1,最大深度为3,损失函数为RMSE, 早停回数为100 | 
| LightGBM | 学习率为0.1,最大深度为5,最少叶节点样本数为15, 叶节点数为14,特征随机采样比例为0.95, 正则化权重为0.01,Bagging采样比例为0.6 | 
| XGBoost | 学习率为0.07,最大深度为3,Gamma值为0.5,最大叶子 节点数为5,L1正则化权重为0.5,L2正则化权重为0.3 | 
Tab. 4 Hyperparameter setting of base models
| 模型 | 超参数设置 | 
|---|---|
| CatBoost | 学习率为0.1,最大深度为3,损失函数为RMSE, 早停回数为100 | 
| LightGBM | 学习率为0.1,最大深度为5,最少叶节点样本数为15, 叶节点数为14,特征随机采样比例为0.95, 正则化权重为0.01,Bagging采样比例为0.6 | 
| XGBoost | 学习率为0.07,最大深度为3,Gamma值为0.5,最大叶子 节点数为5,L1正则化权重为0.5,L2正则化权重为0.3 | 
| 超参数 | 值 | 
|---|---|
| 输入特征维度 | 15 | 
| 隐藏层神经元个数 | 10 | 
| 输出维度 | 15 | 
| 批次大小 | 128 | 
| 学习率 | 0.001 | 
| L2正则化系数 | 0.000 1 | 
| 训练轮次 | 50 | 
Tab. 5 Hyperparameter setting of residual neural network
| 超参数 | 值 | 
|---|---|
| 输入特征维度 | 15 | 
| 隐藏层神经元个数 | 10 | 
| 输出维度 | 15 | 
| 批次大小 | 128 | 
| 学习率 | 0.001 | 
| L2正则化系数 | 0.000 1 | 
| 训练轮次 | 50 | 
| 模型 | RMSE | 准确率/% | 
|---|---|---|
| 线性回归 | 2.553 | 93.06 | 
| 岭回归 | 2.553 | 93.27 | 
| 随机森林 | 2.446 | 92.65 | 
| 神经网络 | 2.515 | 91.82 | 
| XGBoost | 2.400 | 93.48 | 
| LightGBM | 2.395 | 92.24 | 
| CatBoost | 2.363 | 93.69 | 
Tab. 6 Performance of single model on dataset of enterprise self-discipline evaluation
| 模型 | RMSE | 准确率/% | 
|---|---|---|
| 线性回归 | 2.553 | 93.06 | 
| 岭回归 | 2.553 | 93.27 | 
| 随机森林 | 2.446 | 92.65 | 
| 神经网络 | 2.515 | 91.82 | 
| XGBoost | 2.400 | 93.48 | 
| LightGBM | 2.395 | 92.24 | 
| CatBoost | 2.363 | 93.69 | 
| 第2层模型 | RMSE | 准确率/% | 
|---|---|---|
| 线性回归 | 2.369 | 92.86 | 
| CatBoost | 2.371 | 93.79 | 
| XGBoost | 2.465 | 93.79 | 
| LightGBM | 2.413 | 91.82 | 
| 传统神经网络 | 2.330 | 93.79 | 
| 残差预测神经网络 | 2.266 | 94.51 | 
Tab. 7 Results under different second level models
| 第2层模型 | RMSE | 准确率/% | 
|---|---|---|
| 线性回归 | 2.369 | 92.86 | 
| CatBoost | 2.371 | 93.79 | 
| XGBoost | 2.465 | 93.79 | 
| LightGBM | 2.413 | 91.82 | 
| 传统神经网络 | 2.330 | 93.79 | 
| 残差预测神经网络 | 2.266 | 94.51 | 
| 模型 | RMSE | 准确率/% | 
|---|---|---|
| AutoGluon | 2.357 | 93.79 | 
| TenrepNN | 2.266 | 94.51 | 
Tab. 8 Comparison of the proposed model and automatic machine learning framework AutoGluon
| 模型 | RMSE | 准确率/% | 
|---|---|---|
| AutoGluon | 2.357 | 93.79 | 
| TenrepNN | 2.266 | 94.51 | 
| 模型 | RMSE | 准确率/% | 
|---|---|---|
| 去掉残差预测神经网络的模型 | 2.319 | 93.79 | 
| TenrepNN | 2.266 | 94.51 | 
Tab. 9 Ablation experimental results of residual prediction neural network
| 模型 | RMSE | 准确率/% | 
|---|---|---|
| 去掉残差预测神经网络的模型 | 2.319 | 93.79 | 
| TenrepNN | 2.266 | 94.51 | 
| 模型 | RMSE | 准确率/% | 
|---|---|---|
| AdaBoost | 2.843 | 85.40 | 
| Stacking | 2.832 | 91.20 | 
| Bagging | 2.337 | 94.10 | 
| GBDT | 2.287 | 94.72 | 
| TenrepNN | 2.266 | 94.51 | 
Tab. 10 Comparison of the proposed model and other ensemble learning methods
| 模型 | RMSE | 准确率/% | 
|---|---|---|
| AdaBoost | 2.843 | 85.40 | 
| Stacking | 2.832 | 91.20 | 
| Bagging | 2.337 | 94.10 | 
| GBDT | 2.287 | 94.72 | 
| TenrepNN | 2.266 | 94.51 | 
| 1 | 中国互联网络信息中心. 第50 次中国互联网络发展状况统计报告[R/OL]. (2022-08-31) [2022-04-22].. | 
| China Internet Network Information Center. The 50th statistical reports on China’s internet development[R/OL]. (2022-08-31) [2022-04-22].. | |
| 2 | 卢加元,张晓东. 我国网络文化市场分级监管问题研究[J]. 江苏商论, 2021(7):23-26. 10.3969/j.issn.1009-0061.2021.07.007 | 
| LU J Y, ZHANG X D. Research on classified supervision of China’s internet culture market[J]. Jiangsu Commercial Forum, 2021(7): 23-26. 10.3969/j.issn.1009-0061.2021.07.007 | |
| 3 | 张春霞,张讲社. 选择性集成学习算法综述[J]. 计算机学报, 2011, 34(8):1399-1410. 10.3724/sp.j.1016.2011.01399 | 
| ZHANG C X, ZHANG J S. A survey of selective ensemble learning algorithms[J]. Chinese Journal of Computers, 2011, 34(8):1399-1410. 10.3724/sp.j.1016.2011.01399 | |
| 4 | GIJSBERS P, LeDELL R, THOMAS J, et al. An open source AutoML benchmark[EB/OL]. (2019-07-01) [2022-04-22].. | 
| 5 | 张钰. 中小企业财务预警与信用评分研究[J]. 经济研究导刊, 2021(30):63-65. 10.3969/j.issn.1673-291X.2021.30.021 | 
| ZHANG Y. Research on financial early-warning and credit scoring of small and medium-sized enterprises[J]. Economic Research Guide, 2021(30):63-65. 10.3969/j.issn.1673-291X.2021.30.021 | |
| 6 | 邓大松,赵玉龙. 我国商业银行小微企业申请评分卡构建及验证研究[J]. 投资研究, 2017, 36(5):149-159. | 
| DENG D S, ZHAO Y L. Research on the application card on small enterprise in the commercial bank[J]. Review of Investment Studies, 2017, 36(5):149-159. | |
| 7 | 陶爱元,吴俊. 基于DEA方法的我国上市中小企业信用评分研究[J]. 征信, 2014, 32(6):52-56. 10.3969/j.issn.1674-747X.2014.06.013 | 
| TAO A Y, WU J. Study on credit rating for China’s listed SMEs based on DEA method[J]. Credit Reference, 2014, 32(6):52-56. 10.3969/j.issn.1674-747X.2014.06.013 | |
| 8 | 卢悦冉,芮英健,袁芳,等. 基于评分卡模型下中小微企业的信贷决策[J]. 中国市场, 2021(27):53-54. | 
| LU Y R, RUI Y J, YUAN F, et al. Credit decision of SMEs based on score card model[J]. China Market, 2021(27):53-54. | |
| 9 | 秦晓琳. 中小企业借贷信用分析系统的设计与实现[D]. 北京:北京交通大学, 2019:49-56. | 
| QIN X L. Design and implementation of credit analysis system for SME lending[D]. Beijing: Beijing Jiaotong University, 2019:49-56. | |
| 10 | 姜正申,刘宏志,付彬,等. 集成学习的泛化误差和AUC分解理论及其在权重优化中的应用[J]. 计算机学报, 2019, 42(1):1-15. 10.11897/SP.J.1016.2019.00001 | 
| JIANG Z S, LIU H Z, FU B, et al. Decomposition theories of generalization error and AUC in ensemble learning with application in weight optimization[J]. Chinese Journal of Computers, 2019, 42(1):1-15. 10.11897/SP.J.1016.2019.00001 | |
| 11 | ZHOU Z H. Ensemble Methods: Foundations and Algorithms[M]. Boca Raton, FL: CRC Press, 2012:47-50. | 
| 12 | WOLPERT D H. Stacked generalization[J]. Neural Networks, 1992, 5(2): 241-259. 10.1016/s0893-6080(05)80023-1 | 
| 13 | 李珩,朱靖波,姚天顺. 基于Stacking算法的组合分类器及其应用于中文组块分析[J]. 计算机研究与发展, 2005, 42(5):844-848. 10.1360/crad20050519 | 
| LI H, ZHU J B, YAO T S. Combined multiple classifiers based on a Stacking algorithm and their application to Chinese text chunking[J]. Journal of Computer Research and Development, 2005, 42(5):844-848. 10.1360/crad20050519 | |
| 14 | RASCHKA S. MLxtend: providing machine learning and data science utilities and extensions to Python’s scientific computing stack[J]. The Journal of Open Source Software, 2018, 3(24): No.638. 10.21105/joss.00638 | 
| 15 | DŽEROSKI S, ŽENKO B. Is combining classifiers with Stacking better than selecting the best one?[J]. Machine Learning, 2004, 54(3): 255-273. 10.1023/b:mach.0000015881.36452.6e | 
| 16 | BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2):123-140. 10.1007/bf00058655 | 
| 17 | AGARWAL S, CHOWDARY C R. A-Stacking and A-Bagging: adaptive versions of ensemble learning algorithms for spoof fingerprint detection[J]. Expert Systems with Applications, 2020, 146: No.113160. 10.1016/j.eswa.2019.113160 | 
| 18 | SAGI O, ROKACH L. Ensemble learning: a survey[J]. WIREs Data Mining and Knowledge Discovery. 2018, 8(4): No.e1249. 10.1002/widm.1249 | 
| 19 | ERICKSON N, MUELLER J, SHIRKOV A, et al. AutoGluon-Tabular: robust and accurate AutoML for structured data[EB/OL]. (2020-03-13) [2022-05-14].. | 
| 20 | DOROGUSH A V, ERSHOV V, GULIN A. CatBoost: gradient boosting with categorical features support[EB/OL]. (2018-08-24) [2022-03-11].. | 
| 21 | KE G, MENG Q, FINLEY T. LightGBM: a highly efficient gradient boosting decision tree[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017: 3149-3157. 10.18653/v1/w16-13 | 
| 22 | CHEN T, GUESTRIN C. XGBoost: a scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016:785-794. 10.1145/2939672.2939785 | 
| 23 | KOMER B, BERGSTRA J, ELIASMITH C. Hyperopt-Sklearn: automatic hyperparameter configuration for scikit-learn[C/OL]// Proceedings of the 13th Python in Science Conference [2022-04-22].https://conference.scipy.org/proceedings/scipy2014/pdfs/komer.pdf#:~:text=With%20this%20paper%20we%20introduce%20Hyperopt-Sklearn%3A%20a%20project,Scikit-Learn%20components%2C%20including%20preprocessing%20and%20classifi-%20cation%20modules. 10.25080/majora-14bd3278-006 | 
| 24 | ZHANG L, SUGANTHAN P N. Benchmarking ensemble classifiers with novel co-trained kernel ridge regression and random vector functional link ensembles [Research Frontier][J]. IEEE Computational Intelligence Magazine, 2017, 12(4):61-72. 10.1109/mci.2017.2742867 | 
| 25 | LIU W, WANG Z, LIU X, et al. A survey of deep neural network architectures and their applications[J]. Neurocomputing, 2017, 234: 11-26. 10.1016/j.neucom.2016.12.038 | 
| 26 | DRUCKER H. Improving regressors using boosting techniques[C]// Proceedings of the 14th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 1997: 107-115. | 
| 27 | FRIEDMAN J H. Greedy function approximation: a gradient boosting machine[J]. The Annals of Statistics, 2001, 29(5): 1189-1232. 10.1214/aos/1013203451 | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||