TenrepNN：集成学习的新范式在企业自律性评价中的实践

doi:10.11772/j.issn.1001-9081.2022091454

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (10): 3107-3113.DOI: 10.11772/j.issn.1001-9081.2022091454

• 人工智能 • 上一篇

TenrepNN：集成学习的新范式在企业自律性评价中的实践

赵敬涛¹^,², 赵泽方¹^,², 岳兆娟¹, 李俊¹^,²()

^1.中国科学院计算机网络信息中心，北京 100083
^2.中国科学院大学计算机科学与技术学院，北京 100049

收稿日期:2022-09-30 修回日期:2022-12-15 接受日期:2023-01-05 发布日期:2023-03-17 出版日期:2023-10-10
通讯作者: 李俊
作者简介:赵敬涛（1998—），男，山东聊城人，硕士研究生，主要研究方向：推荐系统、机器学习
赵泽方（1996—），男，山西临汾人，博士研究生，主要研究方向：自然语言处理、情感分析
岳兆娟（1984—），女，河南驻马店人，高级工程师，博士，主要研究方向：计算传播、数据挖掘；
基金资助:
国家重点研发计划项目(2019YFB1405801)

TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation

Jingtao ZHAO¹^,², Zefang ZHAO¹^,², Zhaojuan YUE¹, Jun LI¹^,²()

^1.Computer Network Information Center，Chinese Academy of Sciences，Beijing 100083，China
^2.School of Computer Science and Technology，University of Chinese Academy of Sciences，Beijing 100049，China

Received:2022-09-30 Revised:2022-12-15 Accepted:2023-01-05 Online:2023-03-17 Published:2023-10-10
Contact: Jun LI
About author:ZHAO Jingtao， born in 1998， M. S. candidate. His research interests include recommendation system， machine learning.
ZHAO Zefang， born in 1996， Ph. D. candidate. His research interests include natural language processing， sentiment analysis.
YUE Zhaojuan， born in 1984， Ph. D.， senior engineer. Her research interests include computing propagation， data mining.
Supported by:
National Key Research and Development Program of China(2019YFB1405801)

摘要/Abstract

摘要：

为了应对互联网环境中企业自律性低、违规事件频发、政府监管困难的现状，提出一种针对企业自律性评价的双层集成残差预测神经网络（TenrepNN）模型，并融合Stacking和Bagging集成学习的思想提出一种集成学习的新范式Adjusting。TenrepNN模型具有两层结构：第1层使用3种基学习器初步预测企业评分；第2层采用残差修正的思想，提出残差预测神经网络以预测每个基学习器的输出偏差。最后，将偏差与基学习器评分相加得到最终输出。在企业自律性评价数据集上，相较于传统的神经网络，TenrepNN模型的均方根误差（RMSE）降低了2.7%，企业自律性等级分类准确率达到了94.51%。实验结果表明，TenrepNN模型集成不同的基学习器降低预测方差，并使用残差预测神经网络显式地降低偏差，从而能够准确评价企业自律性以实现差异化的动态监管。

关键词: 企业自律性评价, 集成学习范式, 残差预测神经网络, 显式偏差修正, 互联网企业监管

Abstract:

In order to cope with the current situations of low self-discipline， frequent violation events and difficult government supervision of enterprises in the internet environment， a Two-layer ensemble residual prediction Neural Network （TenrepNN） model was proposed to evaluate the self-discipline of enterprises. And by integrating the ideas of Stacking and Bagging ensemble learning， a new paradigm of integrated learning was designed， namely Adjusting. TenrepNN model has a two-layer structure. In the first layer， three base learners were used to predict the enterprise score preliminarily. In the second layer， the idea of residual correction was adopted， and a residual prediction neural network was proposed to predict the output deviation of each base learner. Finally， the final output was obtained by adding the deviations and the base learner scores together. On the enterprise self-discipline evaluation dataset， compared with the traditional neural network， the proposed model has the Root Mean Square Error （RMSE） reduced by 2.7%， and the classification accuracy in the self-discipline level reached 94.51%. Experimental results show that by integrating different base learners to reduce the variance and using residual prediction neural network to decrease the deviation explicitly， TenrepNN model can accurately evaluate enterprise self-discipline to achieve differentiated dynamic supervision.

Key words: enterprise self-discipline evaluation, ensemble learning paradigm, residual prediction neural network, explicit deviation correction, internet enterprise supervision

中图分类号:

TP391.4

赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN：集成学习的新范式在企业自律性评价中的实践[J]. 计算机应用, 2023, 43(10): 3107-3113.

Jingtao ZHAO, Zefang ZHAO, Zhaojuan YUE, Jun LI. TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation[J]. Journal of Computer Applications, 2023, 43(10): 3107-3113.

图/表 15

参考文献 27

1	中国互联网络信息中心. 第50 次中国互联网络发展状况统计报告［R/OL］. （2022-08-31）［2022-04-22］..
	China Internet Network Information Center. The 50th statistical reports on China’s internet development［R/OL］. （2022-08-31）［2022-04-22］..
2	卢加元，张晓东. 我国网络文化市场分级监管问题研究［J］. 江苏商论， 2021（7）：23-26. 10.3969/j.issn.1009-0061.2021.07.007
	LU J Y， ZHANG X D. Research on classified supervision of China’s internet culture market［J］. Jiangsu Commercial Forum， 2021（7）： 23-26. 10.3969/j.issn.1009-0061.2021.07.007
3	张春霞，张讲社. 选择性集成学习算法综述［J］. 计算机学报， 2011， 34（8）：1399-1410. 10.3724/sp.j.1016.2011.01399
	ZHANG C X， ZHANG J S. A survey of selective ensemble learning algorithms［J］. Chinese Journal of Computers， 2011， 34（8）：1399-1410. 10.3724/sp.j.1016.2011.01399
4	GIJSBERS P， LeDELL R， THOMAS J， et al. An open source AutoML benchmark［EB/OL］. （2019-07-01）［2022-04-22］..
5	张钰. 中小企业财务预警与信用评分研究［J］. 经济研究导刊， 2021（30）：63-65. 10.3969/j.issn.1673-291X.2021.30.021
	ZHANG Y. Research on financial early-warning and credit scoring of small and medium-sized enterprises［J］. Economic Research Guide， 2021（30）：63-65. 10.3969/j.issn.1673-291X.2021.30.021
6	邓大松，赵玉龙. 我国商业银行小微企业申请评分卡构建及验证研究［J］. 投资研究， 2017， 36（5）：149-159.
	DENG D S， ZHAO Y L. Research on the application card on small enterprise in the commercial bank［J］. Review of Investment Studies， 2017， 36（5）：149-159.
7	陶爱元，吴俊. 基于DEA方法的我国上市中小企业信用评分研究［J］. 征信， 2014， 32（6）：52-56. 10.3969/j.issn.1674-747X.2014.06.013
	TAO A Y， WU J. Study on credit rating for China’s listed SMEs based on DEA method［J］. Credit Reference， 2014， 32（6）：52-56. 10.3969/j.issn.1674-747X.2014.06.013
8	卢悦冉，芮英健，袁芳，等. 基于评分卡模型下中小微企业的信贷决策［J］. 中国市场， 2021（27）：53-54.
	LU Y R， RUI Y J， YUAN F， et al. Credit decision of SMEs based on score card model［J］. China Market， 2021（27）：53-54.
9	秦晓琳. 中小企业借贷信用分析系统的设计与实现［D］. 北京：北京交通大学， 2019：49-56.
	QIN X L. Design and implementation of credit analysis system for SME lending［D］. Beijing： Beijing Jiaotong University， 2019：49-56.
10	姜正申，刘宏志，付彬，等. 集成学习的泛化误差和AUC分解理论及其在权重优化中的应用［J］. 计算机学报， 2019， 42（1）：1-15. 10.11897/SP.J.1016.2019.00001
	JIANG Z S， LIU H Z， FU B， et al. Decomposition theories of generalization error and AUC in ensemble learning with application in weight optimization［J］. Chinese Journal of Computers， 2019， 42（1）：1-15. 10.11897/SP.J.1016.2019.00001
11	ZHOU Z H. Ensemble Methods： Foundations and Algorithms［M］. Boca Raton， FL： CRC Press， 2012：47-50.
12	WOLPERT D H. Stacked generalization［J］. Neural Networks， 1992， 5（2）： 241-259. 10.1016/s0893-6080(05)80023-1
13	李珩，朱靖波，姚天顺. 基于Stacking算法的组合分类器及其应用于中文组块分析［J］. 计算机研究与发展， 2005， 42（5）：844-848. 10.1360/crad20050519
	LI H， ZHU J B， YAO T S. Combined multiple classifiers based on a Stacking algorithm and their application to Chinese text chunking［J］. Journal of Computer Research and Development， 2005， 42（5）：844-848. 10.1360/crad20050519
14	RASCHKA S. MLxtend： providing machine learning and data science utilities and extensions to Python’s scientific computing stack［J］. The Journal of Open Source Software， 2018， 3（24）： No.638. 10.21105/joss.00638
15	DŽEROSKI S， ŽENKO B. Is combining classifiers with Stacking better than selecting the best one？［J］. Machine Learning， 2004， 54（3）： 255-273. 10.1023/b:mach.0000015881.36452.6e
16	BREIMAN L. Bagging predictors［J］. Machine Learning， 1996， 24（2）：123-140. 10.1007/bf00058655
17	AGARWAL S， CHOWDARY C R. A-Stacking and A-Bagging： adaptive versions of ensemble learning algorithms for spoof fingerprint detection［J］. Expert Systems with Applications， 2020， 146： No.113160. 10.1016/j.eswa.2019.113160
18	SAGI O， ROKACH L. Ensemble learning： a survey［J］. WIREs Data Mining and Knowledge Discovery. 2018， 8（4）： No.e1249. 10.1002/widm.1249
19	ERICKSON N， MUELLER J， SHIRKOV A， et al. AutoGluon-Tabular： robust and accurate AutoML for structured data［EB/OL］. （2020-03-13）［2022-05-14］..
20	DOROGUSH A V， ERSHOV V， GULIN A. CatBoost： gradient boosting with categorical features support［EB/OL］. （2018-08-24）［2022-03-11］..
21	KE G， MENG Q， FINLEY T. LightGBM： a highly efficient gradient boosting decision tree［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 3149-3157. 10.18653/v1/w16-13
22	CHEN T， GUESTRIN C. XGBoost： a scalable tree boosting system［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016：785-794. 10.1145/2939672.2939785
23	KOMER B， BERGSTRA J， ELIASMITH C. Hyperopt-Sklearn： automatic hyperparameter configuration for scikit-learn［C/OL］// Proceedings of the 13th Python in Science Conference ［2022-04-22］.https：//conference.scipy.org/proceedings/scipy2014/pdfs/komer.pdf#：~：text=With%20this%20paper%20we%20introduce%20Hyperopt-Sklearn%3A%20a%20project，Scikit-Learn%20components%2C%20including%20preprocessing%20and%20classifi-%20cation%20modules. 10.25080/majora-14bd3278-006
24	ZHANG L， SUGANTHAN P N. Benchmarking ensemble classifiers with novel co-trained kernel ridge regression and random vector functional link ensembles ［Research Frontier］［J］. IEEE Computational Intelligence Magazine， 2017， 12（4）：61-72. 10.1109/mci.2017.2742867
25	LIU W， WANG Z， LIU X， et al. A survey of deep neural network architectures and their applications［J］. Neurocomputing， 2017， 234： 11-26. 10.1016/j.neucom.2016.12.038
26	DRUCKER H. Improving regressors using boosting techniques［C］// Proceedings of the 14th International Conference on Machine Learning. San Francisco： Morgan Kaufmann Publishers Inc.， 1997： 107-115.
27	FRIEDMAN J H. Greedy function approximation： a gradient boosting machine［J］. The Annals of Statistics， 2001， 29（5）： 1189-1232. 10.1214/aos/1013203451

属性	介绍	属性	介绍
A1	用户投诉量	A9	公司类型
A2	用户投诉回复率	A10	人员规模
A3	上市情况	A11	参保人数
A4	企业性质	A12	行政许可量
A5	注册属地	A13	司法案件量
A6	经营状态	A14	行政处罚量
A7	成立日期	A15	守信激励量
A8	注册资本	A16	失信惩戒量

属性	介绍	属性	介绍
A1	用户投诉量	A9	公司类型
A2	用户投诉回复率	A10	人员规模
A3	上市情况	A11	参保人数
A4	企业性质	A12	行政许可量
A5	注册属地	A13	司法案件量
A6	经营状态	A14	行政处罚量
A7	成立日期	A15	守信激励量
A8	注册资本	A16	失信惩戒量

模型	误差	模型	误差
LightGBM	5.44	LightGBMLarge	5.93
CatBoost	5.54	NeuralNetTorch	6.01
XGBoost	5.60	RandomForestMSE	6.11
LightGBMXT	5.83	ExtraTreesMSE	6.26

模型	误差	模型	误差
LightGBM	5.44	LightGBMLarge	5.93
CatBoost	5.54	NeuralNetTorch	6.01
XGBoost	5.60	RandomForestMSE	6.11
LightGBMXT	5.83	ExtraTreesMSE	6.26

企业自律性得分	等级	说明
90~100	AAA	企业自律性较好，能够较好地把控网络文化信息的质量，快速处理问题
80~89	AA	企业自律性较好，能够较好地把控网络文化信息的质量，快速处理问题
70~79	A	企业自律性一般，能够处理网络文化中出现的问题，但不能及时处理
60~69	B	企业自律性一般，能够处理网络文化中出现的问题，但不能及时处理
0~59	C	企业自律性较差，经营状况不佳，不能处理网络文化中出现的问题

TenrepNN：集成学习的新范式在企业自律性评价中的实践

TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 27

相关文章 15

编辑推荐

Metrics

超参数	值
输入特征维度	15
隐藏层神经元个数	10
输出维度	15
批次大小	128
学习率	0.001
L2正则化系数	0.000 1
训练轮次	50

模型	RMSE	准确率/%
线性回归	2.553	93.06
岭回归	2.553	93.27
随机森林	2.446	92.65
神经网络	2.515	91.82
XGBoost	2.400	93.48
LightGBM	2.395	92.24
CatBoost	2.363	93.69

第2层模型	RMSE	准确率/%
线性回归	2.369	92.86
CatBoost	2.371	93.79
XGBoost	2.465	93.79
LightGBM	2.413	91.82
传统神经网络	2.330	93.79
残差预测神经网络	2.266	94.51

模型	RMSE	准确率/%
去掉残差预测神经网络的模型	2.319	93.79
TenrepNN	2.266	94.51

模型	RMSE	准确率/%
AdaBoost	2.843	85.40
Stacking	2.832	91.20
Bagging	2.337	94.10
GBDT	2.287	94.72
TenrepNN	2.266	94.51

[1]	毕以镇, 马焕, 张长青. 增广模态收益动态评估方法[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3099-3106.
[2]	虞资兴, 瞿绍军, 何鑫, 王卓. 高低维特征引导的实时语义分割网络[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3077-3085.
[3]	邵怡敏, 赵凡, 王轶, 王保全. 基于区块链技术及应用的可视化研究综述[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3038-3046.
[4]	程小辉黄云天张瑞芳. 基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[5]	宋霄罡张冬冬张鹏飞梁莉黑新宏. 面向复杂施工环境的实时目标检测算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[6]	刘子涵周登文刘玉铠. 基于全局依赖 Transformer 的图像超分辨率网络[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[7]	孔哲李寒甘少伟孔明茹何冰涛郭子钰金督程邱兆文. 基于非对称多解码器和注意力模块的三维肾脏影像结构分割模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[8]	王晓路千王菲. 基于双支路卷积网络的步态识别方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[9]	马天席润韬吕佳豪曾奕杰杨嘉怡张杰慧. 基于深度强化学习的移动机器人三维路径规划方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[10]	刘源泂何茂征黄益斌钱程. 基于ResNet50和改进注意力机制的船舶识别算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[11]	吴锦富柳毅. 基于随机噪声和自适应步长的快速对抗训练方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[12]	陈蒙蒙, 乔志伟. 基于融合通道注意力的Uformer的CT图像稀疏重建[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2948-2954.
[13]	张心月, 刘蓉, 魏驰宇, 方可. 融合提示知识的方面级情感分析方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2753-2759.
[14]	周萌, 黄章进. 基于失焦模糊的焦点堆栈深度估计方法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2897-2903.
[15]	杨昊, 张轶. 基于上下文信息和多尺度融合重要性感知的特征金字塔网络算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2727-2734.