TenrepNN：集成学习的新范式在企业自律性评价中的实践

doi:10.11772/j.issn.1001-9081.2022091454

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (10): 3107-3113.DOI: 10.11772/j.issn.1001-9081.2022091454

所属专题：人工智能

TenrepNN：集成学习的新范式在企业自律性评价中的实践

赵敬涛¹^,², 赵泽方¹^,², 岳兆娟¹, 李俊¹^,²()

^1.中国科学院计算机网络信息中心，北京 100083
^2.中国科学院大学计算机科学与技术学院，北京 100049

收稿日期:2022-09-30 修回日期:2022-12-15 接受日期:2023-01-05 发布日期:2023-03-17 出版日期:2023-10-10
通讯作者: 李俊
作者简介:赵敬涛（1998—），男，山东聊城人，硕士研究生，主要研究方向：推荐系统、机器学习
赵泽方（1996—），男，山西临汾人，博士研究生，主要研究方向：自然语言处理、情感分析
岳兆娟（1984—），女，河南驻马店人，高级工程师，博士，主要研究方向：计算传播、数据挖掘；
基金资助:
国家重点研发计划项目(2019YFB1405801)

TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation

Jingtao ZHAO¹^,², Zefang ZHAO¹^,², Zhaojuan YUE¹, Jun LI¹^,²()

^1.Computer Network Information Center，Chinese Academy of Sciences，Beijing 100083，China
^2.School of Computer Science and Technology，University of Chinese Academy of Sciences，Beijing 100049，China

Received:2022-09-30 Revised:2022-12-15 Accepted:2023-01-05 Online:2023-03-17 Published:2023-10-10
Contact: Jun LI
About author:ZHAO Jingtao， born in 1998， M. S. candidate. His research interests include recommendation system， machine learning.
ZHAO Zefang， born in 1996， Ph. D. candidate. His research interests include natural language processing， sentiment analysis.
YUE Zhaojuan， born in 1984， Ph. D.， senior engineer. Her research interests include computing propagation， data mining.
Supported by:
National Key Research and Development Program of China(2019YFB1405801)

摘要/Abstract

摘要：

为了应对互联网环境中企业自律性低、违规事件频发、政府监管困难的现状，提出一种针对企业自律性评价的双层集成残差预测神经网络（TenrepNN）模型，并融合Stacking和Bagging集成学习的思想提出一种集成学习的新范式Adjusting。TenrepNN模型具有两层结构：第1层使用3种基学习器初步预测企业评分；第2层采用残差修正的思想，提出残差预测神经网络以预测每个基学习器的输出偏差。最后，将偏差与基学习器评分相加得到最终输出。在企业自律性评价数据集上，相较于传统的神经网络，TenrepNN模型的均方根误差（RMSE）降低了2.7%，企业自律性等级分类准确率达到了94.51%。实验结果表明，TenrepNN模型集成不同的基学习器降低预测方差，并使用残差预测神经网络显式地降低偏差，从而能够准确评价企业自律性以实现差异化的动态监管。

关键词: 企业自律性评价, 集成学习范式, 残差预测神经网络, 显式偏差修正, 互联网企业监管

Abstract:

In order to cope with the current situations of low self-discipline， frequent violation events and difficult government supervision of enterprises in the internet environment， a Two-layer ensemble residual prediction Neural Network （TenrepNN） model was proposed to evaluate the self-discipline of enterprises. And by integrating the ideas of Stacking and Bagging ensemble learning， a new paradigm of integrated learning was designed， namely Adjusting. TenrepNN model has a two-layer structure. In the first layer， three base learners were used to predict the enterprise score preliminarily. In the second layer， the idea of residual correction was adopted， and a residual prediction neural network was proposed to predict the output deviation of each base learner. Finally， the final output was obtained by adding the deviations and the base learner scores together. On the enterprise self-discipline evaluation dataset， compared with the traditional neural network， the proposed model has the Root Mean Square Error （RMSE） reduced by 2.7%， and the classification accuracy in the self-discipline level reached 94.51%. Experimental results show that by integrating different base learners to reduce the variance and using residual prediction neural network to decrease the deviation explicitly， TenrepNN model can accurately evaluate enterprise self-discipline to achieve differentiated dynamic supervision.

Key words: enterprise self-discipline evaluation, ensemble learning paradigm, residual prediction neural network, explicit deviation correction, internet enterprise supervision

中图分类号:

TP391.4

赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN：集成学习的新范式在企业自律性评价中的实践[J]. 计算机应用, 2023, 43(10): 3107-3113.

Jingtao ZHAO, Zefang ZHAO, Zhaojuan YUE, Jun LI. TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation[J]. Journal of Computer Applications, 2023, 43(10): 3107-3113.

图/表 15

参考文献 27

1	中国互联网络信息中心. 第50 次中国互联网络发展状况统计报告［R/OL］. （2022-08-31）［2022-04-22］..
	China Internet Network Information Center. The 50th statistical reports on China’s internet development［R/OL］. （2022-08-31）［2022-04-22］..
2	卢加元，张晓东. 我国网络文化市场分级监管问题研究［J］. 江苏商论， 2021（7）：23-26. 10.3969/j.issn.1009-0061.2021.07.007
	LU J Y， ZHANG X D. Research on classified supervision of China’s internet culture market［J］. Jiangsu Commercial Forum， 2021（7）： 23-26. 10.3969/j.issn.1009-0061.2021.07.007
3	张春霞，张讲社. 选择性集成学习算法综述［J］. 计算机学报， 2011， 34（8）：1399-1410. 10.3724/sp.j.1016.2011.01399
	ZHANG C X， ZHANG J S. A survey of selective ensemble learning algorithms［J］. Chinese Journal of Computers， 2011， 34（8）：1399-1410. 10.3724/sp.j.1016.2011.01399
4	GIJSBERS P， LeDELL R， THOMAS J， et al. An open source AutoML benchmark［EB/OL］. （2019-07-01）［2022-04-22］..
5	张钰. 中小企业财务预警与信用评分研究［J］. 经济研究导刊， 2021（30）：63-65. 10.3969/j.issn.1673-291X.2021.30.021
	ZHANG Y. Research on financial early-warning and credit scoring of small and medium-sized enterprises［J］. Economic Research Guide， 2021（30）：63-65. 10.3969/j.issn.1673-291X.2021.30.021
6	邓大松，赵玉龙. 我国商业银行小微企业申请评分卡构建及验证研究［J］. 投资研究， 2017， 36（5）：149-159.
	DENG D S， ZHAO Y L. Research on the application card on small enterprise in the commercial bank［J］. Review of Investment Studies， 2017， 36（5）：149-159.
7	陶爱元，吴俊. 基于DEA方法的我国上市中小企业信用评分研究［J］. 征信， 2014， 32（6）：52-56. 10.3969/j.issn.1674-747X.2014.06.013
	TAO A Y， WU J. Study on credit rating for China’s listed SMEs based on DEA method［J］. Credit Reference， 2014， 32（6）：52-56. 10.3969/j.issn.1674-747X.2014.06.013
8	卢悦冉，芮英健，袁芳，等. 基于评分卡模型下中小微企业的信贷决策［J］. 中国市场， 2021（27）：53-54.
	LU Y R， RUI Y J， YUAN F， et al. Credit decision of SMEs based on score card model［J］. China Market， 2021（27）：53-54.
9	秦晓琳. 中小企业借贷信用分析系统的设计与实现［D］. 北京：北京交通大学， 2019：49-56.
	QIN X L. Design and implementation of credit analysis system for SME lending［D］. Beijing： Beijing Jiaotong University， 2019：49-56.
10	姜正申，刘宏志，付彬，等. 集成学习的泛化误差和AUC分解理论及其在权重优化中的应用［J］. 计算机学报， 2019， 42（1）：1-15. 10.11897/SP.J.1016.2019.00001
	JIANG Z S， LIU H Z， FU B， et al. Decomposition theories of generalization error and AUC in ensemble learning with application in weight optimization［J］. Chinese Journal of Computers， 2019， 42（1）：1-15. 10.11897/SP.J.1016.2019.00001
11	ZHOU Z H. Ensemble Methods： Foundations and Algorithms［M］. Boca Raton， FL： CRC Press， 2012：47-50.
12	WOLPERT D H. Stacked generalization［J］. Neural Networks， 1992， 5（2）： 241-259. 10.1016/s0893-6080(05)80023-1
13	李珩，朱靖波，姚天顺. 基于Stacking算法的组合分类器及其应用于中文组块分析［J］. 计算机研究与发展， 2005， 42（5）：844-848. 10.1360/crad20050519
	LI H， ZHU J B， YAO T S. Combined multiple classifiers based on a Stacking algorithm and their application to Chinese text chunking［J］. Journal of Computer Research and Development， 2005， 42（5）：844-848. 10.1360/crad20050519
14	RASCHKA S. MLxtend： providing machine learning and data science utilities and extensions to Python’s scientific computing stack［J］. The Journal of Open Source Software， 2018， 3（24）： No.638. 10.21105/joss.00638
15	DŽEROSKI S， ŽENKO B. Is combining classifiers with Stacking better than selecting the best one？［J］. Machine Learning， 2004， 54（3）： 255-273. 10.1023/b:mach.0000015881.36452.6e
16	BREIMAN L. Bagging predictors［J］. Machine Learning， 1996， 24（2）：123-140. 10.1007/bf00058655
17	AGARWAL S， CHOWDARY C R. A-Stacking and A-Bagging： adaptive versions of ensemble learning algorithms for spoof fingerprint detection［J］. Expert Systems with Applications， 2020， 146： No.113160. 10.1016/j.eswa.2019.113160
18	SAGI O， ROKACH L. Ensemble learning： a survey［J］. WIREs Data Mining and Knowledge Discovery. 2018， 8（4）： No.e1249. 10.1002/widm.1249
19	ERICKSON N， MUELLER J， SHIRKOV A， et al. AutoGluon-Tabular： robust and accurate AutoML for structured data［EB/OL］. （2020-03-13）［2022-05-14］..
20	DOROGUSH A V， ERSHOV V， GULIN A. CatBoost： gradient boosting with categorical features support［EB/OL］. （2018-08-24）［2022-03-11］..
21	KE G， MENG Q， FINLEY T. LightGBM： a highly efficient gradient boosting decision tree［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 3149-3157. 10.18653/v1/w16-13
22	CHEN T， GUESTRIN C. XGBoost： a scalable tree boosting system［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016：785-794. 10.1145/2939672.2939785
23	KOMER B， BERGSTRA J， ELIASMITH C. Hyperopt-Sklearn： automatic hyperparameter configuration for scikit-learn［C/OL］// Proceedings of the 13th Python in Science Conference ［2022-04-22］.https：//conference.scipy.org/proceedings/scipy2014/pdfs/komer.pdf#：~：text=With%20this%20paper%20we%20introduce%20Hyperopt-Sklearn%3A%20a%20project，Scikit-Learn%20components%2C%20including%20preprocessing%20and%20classifi-%20cation%20modules. 10.25080/majora-14bd3278-006
24	ZHANG L， SUGANTHAN P N. Benchmarking ensemble classifiers with novel co-trained kernel ridge regression and random vector functional link ensembles ［Research Frontier］［J］. IEEE Computational Intelligence Magazine， 2017， 12（4）：61-72. 10.1109/mci.2017.2742867
25	LIU W， WANG Z， LIU X， et al. A survey of deep neural network architectures and their applications［J］. Neurocomputing， 2017， 234： 11-26. 10.1016/j.neucom.2016.12.038
26	DRUCKER H. Improving regressors using boosting techniques［C］// Proceedings of the 14th International Conference on Machine Learning. San Francisco： Morgan Kaufmann Publishers Inc.， 1997： 107-115.
27	FRIEDMAN J H. Greedy function approximation： a gradient boosting machine［J］. The Annals of Statistics， 2001， 29（5）： 1189-1232. 10.1214/aos/1013203451

属性	介绍	属性	介绍
A1	用户投诉量	A9	公司类型
A2	用户投诉回复率	A10	人员规模
A3	上市情况	A11	参保人数
A4	企业性质	A12	行政许可量
A5	注册属地	A13	司法案件量
A6	经营状态	A14	行政处罚量
A7	成立日期	A15	守信激励量
A8	注册资本	A16	失信惩戒量

属性	介绍	属性	介绍
A1	用户投诉量	A9	公司类型
A2	用户投诉回复率	A10	人员规模
A3	上市情况	A11	参保人数
A4	企业性质	A12	行政许可量
A5	注册属地	A13	司法案件量
A6	经营状态	A14	行政处罚量
A7	成立日期	A15	守信激励量
A8	注册资本	A16	失信惩戒量

模型	误差	模型	误差
LightGBM	5.44	LightGBMLarge	5.93
CatBoost	5.54	NeuralNetTorch	6.01
XGBoost	5.60	RandomForestMSE	6.11
LightGBMXT	5.83	ExtraTreesMSE	6.26

模型	误差	模型	误差
LightGBM	5.44	LightGBMLarge	5.93
CatBoost	5.54	NeuralNetTorch	6.01
XGBoost	5.60	RandomForestMSE	6.11
LightGBMXT	5.83	ExtraTreesMSE	6.26

企业自律性得分	等级	说明
90~100	AAA	企业自律性较好，能够较好地把控网络文化信息的质量，快速处理问题
80~89	AA	企业自律性较好，能够较好地把控网络文化信息的质量，快速处理问题
70~79	A	企业自律性一般，能够处理网络文化中出现的问题，但不能及时处理
60~69	B	企业自律性一般，能够处理网络文化中出现的问题，但不能及时处理
0~59	C	企业自律性较差，经营状况不佳，不能处理网络文化中出现的问题

TenrepNN：集成学习的新范式在企业自律性评价中的实践

TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 27

相关文章 15

编辑推荐

Metrics

超参数	值
输入特征维度	15
隐藏层神经元个数	10
输出维度	15
批次大小	128
学习率	0.001
L2正则化系数	0.000 1
训练轮次	50

模型	RMSE	准确率/%
线性回归	2.553	93.06
岭回归	2.553	93.27
随机森林	2.446	92.65
神经网络	2.515	91.82
XGBoost	2.400	93.48
LightGBM	2.395	92.24
CatBoost	2.363	93.69

第2层模型	RMSE	准确率/%
线性回归	2.369	92.86
CatBoost	2.371	93.79
XGBoost	2.465	93.79
LightGBM	2.413	91.82
传统神经网络	2.330	93.79
残差预测神经网络	2.266	94.51

模型	RMSE	准确率/%
去掉残差预测神经网络的模型	2.319	93.79
TenrepNN	2.266	94.51

模型	RMSE	准确率/%
AdaBoost	2.843	85.40
Stacking	2.832	91.20
Bagging	2.337	94.10
GBDT	2.287	94.72
TenrepNN	2.266	94.51

[1]	李强白少雄熊源袁薇. 基于视觉大模型隐私保护的监控图像定位[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[2]	况世雄姚俊波陆佳炜王琪冰肖刚. 基于动态图卷积网络的电梯乘客异常行为数据增强方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[3]	康斌陈斌王俊杰李昱林赵军智咸伟志. 基于多粒度共享语义中心关联的文本到人物检索方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[4]	王磊胡节彭博. 用于半监督火灾检测的分布自适应和动态课程伪标签框架[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[5]	王子怡李卫军刘雪洋丁建平刘世侠苏易礌. 基于Swin Transformer与多尺度特征融合的图像描述方法#br# [J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[6]	熊炜, 陈奕博, 张丽真, 杨茜, 邹勤. 利用多帧序列影像的自监督单目深度估计[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3907-3914.
[7]	付可意, 王高才, 邬满. 基于改进区域提议网络和特征聚合小样本目标检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3790-3797.
[8]	杨成昊, 胡节, 王红军, 彭博. 基于注意力机制的不完备多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3784-3789.
[9]	方鹏, 赵凡, 王保全, 王轶, 蒋同海. 区块链3.0的发展、技术与应用[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3647-3657.
[10]	庞玉东, 李志星, 刘伟杰, 李天昊, 王宁宁. 基于改进实时检测Transformer的塔机上俯视场景小目标检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3922-3929.
[11]	赵欣, 李鑫杰, 徐健, 刘步云, 毕祥. 基于卷积神经网络与Transformer并行的医学图像配准模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3915-3921.
[12]	李维刚曹文杰李金灵. 基于自适应邻域特征融合的多阶段点云补全网络[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[13]	陶永鹏柏诗淇周正文. 基于卷积和Transformer神经网络架构搜索的脑胶质瘤多组织分割网络[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[14]	陈维施昌勇马传香. 基于多模态数据融合的农作物病害识别方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[15]	张大权董家瑞雷洋李世康石响宇李宗辉邓仰东吴为民. 光线追踪硬件加速方案综述[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.