TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation

doi:10.11772/j.issn.1001-9081.2022091454

Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (10): 3107-3113.DOI: 10.11772/j.issn.1001-9081.2022091454

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation

Jingtao ZHAO¹^,², Zefang ZHAO¹^,², Zhaojuan YUE¹, Jun LI¹^,²()

^1.Computer Network Information Center，Chinese Academy of Sciences，Beijing 100083，China
^2.School of Computer Science and Technology，University of Chinese Academy of Sciences，Beijing 100049，China

Received:2022-09-30 Revised:2022-12-15 Accepted:2023-01-05 Online:2023-03-17 Published:2023-10-10
Contact: Jun LI
About author:ZHAO Jingtao， born in 1998， M. S. candidate. His research interests include recommendation system， machine learning.
ZHAO Zefang， born in 1996， Ph. D. candidate. His research interests include natural language processing， sentiment analysis.
YUE Zhaojuan， born in 1984， Ph. D.， senior engineer. Her research interests include computing propagation， data mining.
Supported by:
National Key Research and Development Program of China(2019YFB1405801)

TenrepNN：集成学习的新范式在企业自律性评价中的实践

赵敬涛¹^,², 赵泽方¹^,², 岳兆娟¹, 李俊¹^,²()

^1.中国科学院计算机网络信息中心，北京 100083
^2.中国科学院大学计算机科学与技术学院，北京 100049

通讯作者: 李俊
作者简介:赵敬涛（1998—），男，山东聊城人，硕士研究生，主要研究方向：推荐系统、机器学习
赵泽方（1996—），男，山西临汾人，博士研究生，主要研究方向：自然语言处理、情感分析
岳兆娟（1984—），女，河南驻马店人，高级工程师，博士，主要研究方向：计算传播、数据挖掘；
基金资助:
国家重点研发计划项目(2019YFB1405801)

Abstract

Abstract:

In order to cope with the current situations of low self-discipline， frequent violation events and difficult government supervision of enterprises in the internet environment， a Two-layer ensemble residual prediction Neural Network （TenrepNN） model was proposed to evaluate the self-discipline of enterprises. And by integrating the ideas of Stacking and Bagging ensemble learning， a new paradigm of integrated learning was designed， namely Adjusting. TenrepNN model has a two-layer structure. In the first layer， three base learners were used to predict the enterprise score preliminarily. In the second layer， the idea of residual correction was adopted， and a residual prediction neural network was proposed to predict the output deviation of each base learner. Finally， the final output was obtained by adding the deviations and the base learner scores together. On the enterprise self-discipline evaluation dataset， compared with the traditional neural network， the proposed model has the Root Mean Square Error （RMSE） reduced by 2.7%， and the classification accuracy in the self-discipline level reached 94.51%. Experimental results show that by integrating different base learners to reduce the variance and using residual prediction neural network to decrease the deviation explicitly， TenrepNN model can accurately evaluate enterprise self-discipline to achieve differentiated dynamic supervision.

Key words: enterprise self-discipline evaluation, ensemble learning paradigm, residual prediction neural network, explicit deviation correction, internet enterprise supervision

摘要：

为了应对互联网环境中企业自律性低、违规事件频发、政府监管困难的现状，提出一种针对企业自律性评价的双层集成残差预测神经网络（TenrepNN）模型，并融合Stacking和Bagging集成学习的思想提出一种集成学习的新范式Adjusting。TenrepNN模型具有两层结构：第1层使用3种基学习器初步预测企业评分；第2层采用残差修正的思想，提出残差预测神经网络以预测每个基学习器的输出偏差。最后，将偏差与基学习器评分相加得到最终输出。在企业自律性评价数据集上，相较于传统的神经网络，TenrepNN模型的均方根误差（RMSE）降低了2.7%，企业自律性等级分类准确率达到了94.51%。实验结果表明，TenrepNN模型集成不同的基学习器降低预测方差，并使用残差预测神经网络显式地降低偏差，从而能够准确评价企业自律性以实现差异化的动态监管。

关键词: 企业自律性评价, 集成学习范式, 残差预测神经网络, 显式偏差修正, 互联网企业监管

CLC Number:

TP391.4

Jingtao ZHAO, Zefang ZHAO, Zhaojuan YUE, Jun LI. TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation[J]. Journal of Computer Applications, 2023, 43(10): 3107-3113.

赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN：集成学习的新范式在企业自律性评价中的实践[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3107-3113.

Figures/Tables 15

References 27

1	中国互联网络信息中心. 第50 次中国互联网络发展状况统计报告［R/OL］. （2022-08-31）［2022-04-22］..
	China Internet Network Information Center. The 50th statistical reports on China’s internet development［R/OL］. （2022-08-31）［2022-04-22］..
2	卢加元，张晓东. 我国网络文化市场分级监管问题研究［J］. 江苏商论， 2021（7）：23-26. 10.3969/j.issn.1009-0061.2021.07.007
	LU J Y， ZHANG X D. Research on classified supervision of China’s internet culture market［J］. Jiangsu Commercial Forum， 2021（7）： 23-26. 10.3969/j.issn.1009-0061.2021.07.007
3	张春霞，张讲社. 选择性集成学习算法综述［J］. 计算机学报， 2011， 34（8）：1399-1410. 10.3724/sp.j.1016.2011.01399
	ZHANG C X， ZHANG J S. A survey of selective ensemble learning algorithms［J］. Chinese Journal of Computers， 2011， 34（8）：1399-1410. 10.3724/sp.j.1016.2011.01399
4	GIJSBERS P， LeDELL R， THOMAS J， et al. An open source AutoML benchmark［EB/OL］. （2019-07-01）［2022-04-22］..
5	张钰. 中小企业财务预警与信用评分研究［J］. 经济研究导刊， 2021（30）：63-65. 10.3969/j.issn.1673-291X.2021.30.021
	ZHANG Y. Research on financial early-warning and credit scoring of small and medium-sized enterprises［J］. Economic Research Guide， 2021（30）：63-65. 10.3969/j.issn.1673-291X.2021.30.021
6	邓大松，赵玉龙. 我国商业银行小微企业申请评分卡构建及验证研究［J］. 投资研究， 2017， 36（5）：149-159.
	DENG D S， ZHAO Y L. Research on the application card on small enterprise in the commercial bank［J］. Review of Investment Studies， 2017， 36（5）：149-159.
7	陶爱元，吴俊. 基于DEA方法的我国上市中小企业信用评分研究［J］. 征信， 2014， 32（6）：52-56. 10.3969/j.issn.1674-747X.2014.06.013
	TAO A Y， WU J. Study on credit rating for China’s listed SMEs based on DEA method［J］. Credit Reference， 2014， 32（6）：52-56. 10.3969/j.issn.1674-747X.2014.06.013
8	卢悦冉，芮英健，袁芳，等. 基于评分卡模型下中小微企业的信贷决策［J］. 中国市场， 2021（27）：53-54.
	LU Y R， RUI Y J， YUAN F， et al. Credit decision of SMEs based on score card model［J］. China Market， 2021（27）：53-54.
9	秦晓琳. 中小企业借贷信用分析系统的设计与实现［D］. 北京：北京交通大学， 2019：49-56.
	QIN X L. Design and implementation of credit analysis system for SME lending［D］. Beijing： Beijing Jiaotong University， 2019：49-56.
10	姜正申，刘宏志，付彬，等. 集成学习的泛化误差和AUC分解理论及其在权重优化中的应用［J］. 计算机学报， 2019， 42（1）：1-15. 10.11897/SP.J.1016.2019.00001
	JIANG Z S， LIU H Z， FU B， et al. Decomposition theories of generalization error and AUC in ensemble learning with application in weight optimization［J］. Chinese Journal of Computers， 2019， 42（1）：1-15. 10.11897/SP.J.1016.2019.00001
11	ZHOU Z H. Ensemble Methods： Foundations and Algorithms［M］. Boca Raton， FL： CRC Press， 2012：47-50.
12	WOLPERT D H. Stacked generalization［J］. Neural Networks， 1992， 5（2）： 241-259. 10.1016/s0893-6080(05)80023-1
13	李珩，朱靖波，姚天顺. 基于Stacking算法的组合分类器及其应用于中文组块分析［J］. 计算机研究与发展， 2005， 42（5）：844-848. 10.1360/crad20050519
	LI H， ZHU J B， YAO T S. Combined multiple classifiers based on a Stacking algorithm and their application to Chinese text chunking［J］. Journal of Computer Research and Development， 2005， 42（5）：844-848. 10.1360/crad20050519
14	RASCHKA S. MLxtend： providing machine learning and data science utilities and extensions to Python’s scientific computing stack［J］. The Journal of Open Source Software， 2018， 3（24）： No.638. 10.21105/joss.00638
15	DŽEROSKI S， ŽENKO B. Is combining classifiers with Stacking better than selecting the best one？［J］. Machine Learning， 2004， 54（3）： 255-273. 10.1023/b:mach.0000015881.36452.6e
16	BREIMAN L. Bagging predictors［J］. Machine Learning， 1996， 24（2）：123-140. 10.1007/bf00058655
17	AGARWAL S， CHOWDARY C R. A-Stacking and A-Bagging： adaptive versions of ensemble learning algorithms for spoof fingerprint detection［J］. Expert Systems with Applications， 2020， 146： No.113160. 10.1016/j.eswa.2019.113160
18	SAGI O， ROKACH L. Ensemble learning： a survey［J］. WIREs Data Mining and Knowledge Discovery. 2018， 8（4）： No.e1249. 10.1002/widm.1249
19	ERICKSON N， MUELLER J， SHIRKOV A， et al. AutoGluon-Tabular： robust and accurate AutoML for structured data［EB/OL］. （2020-03-13）［2022-05-14］..
20	DOROGUSH A V， ERSHOV V， GULIN A. CatBoost： gradient boosting with categorical features support［EB/OL］. （2018-08-24）［2022-03-11］..
21	KE G， MENG Q， FINLEY T. LightGBM： a highly efficient gradient boosting decision tree［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 3149-3157. 10.18653/v1/w16-13
22	CHEN T， GUESTRIN C. XGBoost： a scalable tree boosting system［C］// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York： ACM， 2016：785-794. 10.1145/2939672.2939785
23	KOMER B， BERGSTRA J， ELIASMITH C. Hyperopt-Sklearn： automatic hyperparameter configuration for scikit-learn［C/OL］// Proceedings of the 13th Python in Science Conference ［2022-04-22］.https：//conference.scipy.org/proceedings/scipy2014/pdfs/komer.pdf#：~：text=With%20this%20paper%20we%20introduce%20Hyperopt-Sklearn%3A%20a%20project，Scikit-Learn%20components%2C%20including%20preprocessing%20and%20classifi-%20cation%20modules. 10.25080/majora-14bd3278-006
24	ZHANG L， SUGANTHAN P N. Benchmarking ensemble classifiers with novel co-trained kernel ridge regression and random vector functional link ensembles ［Research Frontier］［J］. IEEE Computational Intelligence Magazine， 2017， 12（4）：61-72. 10.1109/mci.2017.2742867
25	LIU W， WANG Z， LIU X， et al. A survey of deep neural network architectures and their applications［J］. Neurocomputing， 2017， 234： 11-26. 10.1016/j.neucom.2016.12.038
26	DRUCKER H. Improving regressors using boosting techniques［C］// Proceedings of the 14th International Conference on Machine Learning. San Francisco： Morgan Kaufmann Publishers Inc.， 1997： 107-115.
27	FRIEDMAN J H. Greedy function approximation： a gradient boosting machine［J］. The Annals of Statistics， 2001， 29（5）： 1189-1232. 10.1214/aos/1013203451

属性	介绍	属性	介绍
A1	用户投诉量	A9	公司类型
A2	用户投诉回复率	A10	人员规模
A3	上市情况	A11	参保人数
A4	企业性质	A12	行政许可量
A5	注册属地	A13	司法案件量
A6	经营状态	A14	行政处罚量
A7	成立日期	A15	守信激励量
A8	注册资本	A16	失信惩戒量

属性	介绍	属性	介绍
A1	用户投诉量	A9	公司类型
A2	用户投诉回复率	A10	人员规模
A3	上市情况	A11	参保人数
A4	企业性质	A12	行政许可量
A5	注册属地	A13	司法案件量
A6	经营状态	A14	行政处罚量
A7	成立日期	A15	守信激励量
A8	注册资本	A16	失信惩戒量

模型	误差	模型	误差
LightGBM	5.44	LightGBMLarge	5.93
CatBoost	5.54	NeuralNetTorch	6.01
XGBoost	5.60	RandomForestMSE	6.11
LightGBMXT	5.83	ExtraTreesMSE	6.26

模型	误差	模型	误差
LightGBM	5.44	LightGBMLarge	5.93
CatBoost	5.54	NeuralNetTorch	6.01
XGBoost	5.60	RandomForestMSE	6.11
LightGBMXT	5.83	ExtraTreesMSE	6.26

企业自律性得分	等级	说明
90~100	AAA	企业自律性较好，能够较好地把控网络文化信息的质量，快速处理问题
80~89	AA	企业自律性较好，能够较好地把控网络文化信息的质量，快速处理问题
70~79	A	企业自律性一般，能够处理网络文化中出现的问题，但不能及时处理
60~69	B	企业自律性一般，能够处理网络文化中出现的问题，但不能及时处理
0~59	C	企业自律性较差，经营状况不佳，不能处理网络文化中出现的问题

TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation

TenrepNN：集成学习的新范式在企业自律性评价中的实践

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 27

Related Articles 15

Recommended Articles

Metrics

超参数	值
输入特征维度	15
隐藏层神经元个数	10
输出维度	15
批次大小	128
学习率	0.001
L2正则化系数	0.000 1
训练轮次	50

模型	RMSE	准确率/%
线性回归	2.553	93.06
岭回归	2.553	93.27
随机森林	2.446	92.65
神经网络	2.515	91.82
XGBoost	2.400	93.48
LightGBM	2.395	92.24
CatBoost	2.363	93.69

第2层模型	RMSE	准确率/%
线性回归	2.369	92.86
CatBoost	2.371	93.79
XGBoost	2.465	93.79
LightGBM	2.413	91.82
传统神经网络	2.330	93.79
残差预测神经网络	2.266	94.51

模型	RMSE	准确率/%
去掉残差预测神经网络的模型	2.319	93.79
TenrepNN	2.266	94.51

模型	RMSE	准确率/%
AdaBoost	2.843	85.40
Stacking	2.832	91.20
Bagging	2.337	94.10
GBDT	2.287	94.72
TenrepNN	2.266	94.51

[1]	. Privacy preserving localization of surveillance images based on large vision models [J]. Journal of Computer Applications, 0, (): 0-0.
[2]	. Data augmentation method for abnormal passenger behavior in elevators based on dynamic graph convolutional network [J]. Journal of Computer Applications, 0, (): 0-0.
[3]	. Text-based person retrieval method based on multi-granularity shared semantic center association [J]. Journal of Computer Applications, 0, (): 0-0.
[4]	. Distribution adaptation and dynamic curriculum pseudo-labeling framework for semi-supervised fire detection [J]. Journal of Computer Applications, 0, (): 0-0.
[5]	. Image Caption Method Based on Swin-Transformer and Multi-Scale Feature Fusion [J]. Journal of Computer Applications, 0, (): 0-0.
[6]	Wei XIONG, Yibo CHEN, Lizhen ZHANG, Qian YANG, Qin ZOU. Self-supervised monocular depth estimation using multi-frame sequence images [J]. Journal of Computer Applications, 2024, 44(12): 3907-3914.
[7]	Keyi FU, Gaocai WANG, Man WU. Few-shot object detection method based on improved region proposal network and feature aggregation [J]. Journal of Computer Applications, 2024, 44(12): 3790-3797.
[8]	Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789.
[9]	Peng FANG, Fan ZHAO, Baoquan WANG, Yi WANG, Tonghai JIANG. Development， technologies and applications of blockchain 3.0 [J]. Journal of Computer Applications, 2024, 44(12): 3647-3657.
[10]	Yudong PANG, Zhixing LI, Weijie LIU, Tianhao LI, Ningning WANG. Small target detection model in overlooking scenes on tower cranes based on improved real-time detection Transformer [J]. Journal of Computer Applications, 2024, 44(12): 3922-3929.
[11]	Xin ZHAO, Xinjie LI, Jian XU, Buyun LIU, Xiang BI. Parallel medical image registration model based on convolutional neural network and Transformer [J]. Journal of Computer Applications, 2024, 44(12): 3915-3921.
[12]	. Multi-stage point cloud completion network based on adaptive neighborhood feature fusion [J]. Journal of Computer Applications, 0, (): 0-0.
[13]	. Neural architecture search for multi-tissue segmentation using convolutional and transformer-based networks in glioma segmentation [J]. Journal of Computer Applications, 0, (): 0-0.
[14]	CHEN Wei, SHI Changyong, MA Chuanxiang. Crop disease recognition method based on multimodal data fusion#br# #br# [J]. Journal of Computer Applications, 0, (): 0-0.
[15]	. State-of-The-Art Survey on Hardware Acceleration for Ray Tracing [J]. Journal of Computer Applications, 0, (): 0-0.