Improved AdaNet based on adaptive learning rate optimization

doi:10.11772/j.issn.1001-9081.2020020237

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (10): 2804-2810.DOI: 10.11772/j.issn.1001-9081.2020020237

• Artificial intelligence • Previous Articles Next Articles

Improved AdaNet based on adaptive learning rate optimization

LIU Ran^1,2,3,4, LIU Yu^1,2,3,4, GU Jinguang^1,2,3,4

1. College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
2. Key Laboratory of Intelligent Information Processing and Real-time Industrial System in Hubei Province;(Wuhan University of Science and Technology), Wuhan Hubei 430065, China;
3. Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
4. Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content, National Press and Publication Administration(Wuhan University of Science and Technology), Beijing 100038, China

Received:2020-03-05 Revised:2020-05-25 Online:2020-06-24 Published:2020-10-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (U1836118, 61673004), the New Generation Information Technology Innovation Project of the Ministry of Education (2018A03025), the Major Plan of the National Social Science Foundation of China (11&ZD189).

基于自适应学习率优化的AdaNet改进

刘然^1,2,3,4, 刘宇^1,2,3,4, 顾进广^1,2,3,4

1. 武汉科技大学计算机科学与技术学院, 武汉 430065;
2. 智能信息处理与实时工业系统湖北省重点实验室(武汉科技大学), 武汉 430065;
3. 武汉科技大学大数据科学与工程研究院, 武汉 430065;
4. 国家新闻出版署富媒体数字出版内容组织与知识服务重点实验室(武汉科技大学), 北京 100038

通讯作者: 刘宇
作者简介:刘然(1996-),男,湖北天门人,硕士研究生,主要研究方向:机器学习、深度学习、自动机器学习;刘宇(1980-),男,湖北武汉人,副教授,博士,主要研究方向:知识工程、智能系统、分布式计算;顾进广(1974-),男,湖北武汉人,教授,博士,CCF会员,主要研究方向:语义Web、新型网格计算、智能信息处理。
基金资助:
国家自然科学基金资助项目（U1836118，61673004）；教育部新一代信息技术创新项目（2018A03025）；国家社会科学基金重大计划项目（11&ZD189）。

Abstract

Abstract: AdaNet (Adaptive structural learning of artificial neural Networks) is a neural architecture search framework based on Boosting ensemble learning, which can create high-quality models through integrated subnets. The difference between subnets generated by the existing AdaNet is not significant, which limits the reduction of generalization error in ensemble learning. In the two steps of AdaNet:setting subnet network weights and integrating subnets, Adagrad, RMSProp (Root Mean Square Prop), Adam, RAdam (Rectified Adam) and other adaptive learning rate methods were used to improve the existing optimization algorithms in AdaNet. The improved optimization algorithms were able to provide different degrees of learning rate scaling for different dimensional parameters, resulting in a more dispersed weight distribution, so as to increase the diversity of subnets generated by AdaNet, thereby reducing the generalization error of ensemble learning. The experimental results show that on the three datasets:MNIST (Mixed National Institute of Standards and Technology database), Fashion-MNIST and Fashion-MNIST with Gaussian noise, the improved optimization algorithms can improve the search speed of AdaNet, and more diverse subnets generated by the method can improve the performance of the ensemble model. For the F1 value, which is an index to evaluate the model performance, compared with the original method, the improved methods have the largest improvement of 0.28%, 1.05% and 1.10% on the three datasets.

Key words: AdaNet, Neural Architecture Search (NAS), ensemble learning, adaptive learning rate method, Automated Machine Learning (AutoML)

摘要： 人工神经网络的自适应结构学习（AdaNet）是基于Boosting集成学习的神经结构搜索框架，可通过集成子网创建高质量的模型。现有的AdaNet所产生的子网之间的差异性不显著，因而限制了集成学习中泛化误差的降低。在AdaNet设置子网网络权重和集成子网的两个步骤中，使用Adagrad、RMSProp、Adam、RAdam等自适应学习率方法来改进现有AdaNet中的优化算法。改进后的优化算法能够为不同维度参数提供不同程度的学习率缩放，得到更分散的权重分布，以增加AdaNet产生子网的多样性，从而降低集成学习的泛化误差。实验结果表明，在MNIST（Mixed National Institute of Standards and Technology database）、Fashion-MNIST、带高斯噪声的Fashion-MNIST这三个数据集上，改进后的优化算法能提升AdaNet的搜索速度，而且该方法产生的更加多样性的子网能提升集成模型的性能。在F1值这一评估模型性能的指标上，改进后的方法相较于原方法，在三种数据集上的最大提升幅度分别为0.28%、1.05%和1.10%。

关键词: AdaNet, 神经架构搜索, 集成学习, 自适应学习率方法, 自动机器学习

CLC Number:

TP183

LIU Ran, LIU Yu, GU Jinguang. Improved AdaNet based on adaptive learning rate optimization[J]. Journal of Computer Applications, 2020, 40(10): 2804-2810.

刘然, 刘宇, 顾进广. 基于自适应学习率优化的AdaNet改进[J]. 计算机应用, 2020, 40(10): 2804-2810.

References

[1] FREUND Y,SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences,1997,55(1):119-139.
[2] KROGH A, VEDELSBY J. Neural network ensembles, cross validation, and active learning[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,1994:231-238.
[3] LI H,XU Z,TAYLOR G,et al. Visualizing the loss landscape of neural nets[C]//Proceedings of the 2018 Advances in Neural Information Processing Systems,2018:6389-6399.
[4] KEARNS M,VALIANT L. Cryptographic limitations on learning Boolean formulae and finite automata[J]. Journal of the ACM, 1994,41(1):67-95.
[5] SCHAPIRE R E. The strength of weak learnability[J]. Machine Learning,1990,5(2):197-227.
[6] 金海东, 刘全, 陈冬火. 一种带自适应学习率的综合随机梯度下降Q-学习方法[J]. 计算机学报,2019,42(10):2203-2215. (JIN H D, LIU Q, CHEN D H. Adaptive learning-rate on integrated stochastic gradient decreasing Q-learning[J]. Chinese Journal of Computers,2019,42(10):2203-2215.)
[7] ZOPH B,LE Q V. Neural architecture search with reinforcement learning[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1611.01578.pdf.
[8] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:8697-8710.
[9] LIU C, ZOPH B, NEUMANN M, et al. Progressive neural architecture search[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11205. Cham:Springer, 2018:19-35.
[10] RUMELHART D E,HINTON G E,WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature,1986,323(6088):533-536.
[11] DUCHI J,HAZAN E,SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research,2011,12:2121-2159.
[12] MCMAHAN B, STREETER M. Delay-tolerant algorithms for asynchronous distributed online learning[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:2915-2923.
[13] KINGMA D P,BA J. Adam:a method for stochastic optimization[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1412.6980.pdf.
[14] LIU L,JIANG H,HE P,et al. On the variance of the adaptive learning rate and beyond[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1908.03265.pdf.
[15] HE K,ZHANG X,REN S,et al. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:1026-1034.
[16] WILSON D R,MARTINEZ T R. The general inefficiency of batch training for gradient descent learning[J]. Neural Networks,2003, 16(10):1429-1451.
[17] 朱振国, 田松禄. 基于权值变化的BP神经网络自适应学习率改进研究[J]. 计算机系统应用,2018,27(7):205-210.(ZHU Z G,TIAN S L. Improvement of learning rate of feed forward neural network based on weight gradient[J]. Computer Systems and Applications,2018,27(7):205-210.)
[18] RUDER S. An overview of gradient descent optimization algorithms[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1609.04747.pdf.
[19] 仝卫国, 李敏霞, 张一可. 深度学习优化算法研究[J]. 计算机科学,2018,45(11A):155-159.(TONG W G,LI M X,ZHANG Y K. Research on optimization algorithm of deep learning[J]. Computer Science,2018,45(11A):155-159.)

Improved AdaNet based on adaptive learning rate optimization

基于自适应学习率优化的AdaNet改进

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Renke SUN, Zhiyu HUANGFU, Hu CHEN, Zhongnian LI, Xinzheng XU. Survey of neural architecture search [J]. Journal of Computer Applications, 2024, 44(10): 2983-2994.
[2]	Jingtao ZHAO, Zefang ZHAO, Zhaojuan YUE, Jun LI. TenrepNN：practice of new ensemble learning paradigm in enterprise self-discipline evaluation [J]. Journal of Computer Applications, 2023, 43(10): 3107-3113.
[3]	Chunhao CAI, Jianliang LI. Model distillation model based on training weak teacher networks about few-shots [J]. Journal of Computer Applications, 2022, 42(9): 2652-2658.
[4]	Zhenyu WANG, Lei ZHANG, Wenbin GAO, Weiming QUAN. Human activity recognition based on progressive neural architecture search [J]. Journal of Computer Applications, 2022, 42(7): 2058-2064.
[5]	Yiyang GUO, Jiong YU, Xusheng DU, Shaozhi YANG, Ming CAO. Outlier detection algorithm based on autoencoder and ensemble learning [J]. Journal of Computer Applications, 2022, 42(7): 2078-2087.
[6]	Xiaojuan LI, Meng HAN, Le WANG, Ni ZHENG, Haodong CHENG. Dynamic weighted ensemble classification algorithm based on accuracy climbing [J]. Journal of Computer Applications, 2022, 42(1): 123-131.
[7]	MAO Mingze, CAO Ruihao, YAN Chungang. Semi-supervised classification algorithm based on weight diversity [J]. Journal of Computer Applications, 2021, 41(9): 2473-2480.
[8]	YU Dongchang, ZHAO Wenfang, NIE Kai, ZHANG Ge. Visibility forecast model based on LightGBM algorithm [J]. Journal of Computer Applications, 2021, 41(4): 1035-1041.
[9]	LUO Changyin, CHEN Xuebin, MA Chundi, WANG Junyu. Online federated incremental learning algorithm for blockchain [J]. Journal of Computer Applications, 2021, 41(2): 363-371.
[10]	ZHOU Chaoran, ZHAO Jianping, MA Tai, ZHOU Xin. Web page blacklist discrimination method based on attention mechanism and ensemble learning [J]. Journal of Computer Applications, 2021, 41(1): 133-138.
[11]	GU Tong, XU Guoliang, LI Wanlin, LI Jiahao, WANG Zhiyuan, LUO Jiangtao. Intelligent house price evaluation model based on ensemble LightGBM and Bayesian optimization strategy [J]. Journal of Computer Applications, 2020, 40(9): 2762-2767.
[12]	LIU Dan, YAO Lishuang, WANG Yunfeng, PEI Zuofei. Classification model for class imbalanced traffic data [J]. Journal of Computer Applications, 2020, 40(8): 2327-2333.
[13]	SU Junning, YE Dongyi. Under-sampling method based on sample density peaks for imbalanced data [J]. Journal of Computer Applications, 2020, 40(1): 83-89.
[14]	YIN Yu, ZHAN Yongzhao, JIANG Zhen. Semi-supervised ensemble learning for video semantic detection based on pseudo-label confidence selection [J]. Journal of Computer Applications, 2019, 39(8): 2204-2209.
[15]	ZHANG Ning, CHEN Qin. Ensemble learning training method based on AUC and Q statistics [J]. Journal of Computer Applications, 2019, 39(4): 935-939.