基于自适应学习率优化的AdaNet改进

doi:10.11772/j.issn.1001-9081.2020020237

计算机应用 ›› 2020, Vol. 40 ›› Issue (10): 2804-2810.DOI: 10.11772/j.issn.1001-9081.2020020237

基于自适应学习率优化的AdaNet改进

刘然^1,2,3,4, 刘宇^1,2,3,4, 顾进广^1,2,3,4

1. 武汉科技大学计算机科学与技术学院, 武汉 430065;
2. 智能信息处理与实时工业系统湖北省重点实验室(武汉科技大学), 武汉 430065;
3. 武汉科技大学大数据科学与工程研究院, 武汉 430065;
4. 国家新闻出版署富媒体数字出版内容组织与知识服务重点实验室(武汉科技大学), 北京 100038

收稿日期:2020-03-05 修回日期:2020-05-25 发布日期:2020-06-24 出版日期:2020-10-10
通讯作者: 刘宇
作者简介:刘然(1996-),男,湖北天门人,硕士研究生,主要研究方向:机器学习、深度学习、自动机器学习;刘宇(1980-),男,湖北武汉人,副教授,博士,主要研究方向:知识工程、智能系统、分布式计算;顾进广(1974-),男,湖北武汉人,教授,博士,CCF会员,主要研究方向:语义Web、新型网格计算、智能信息处理。
基金资助:
国家自然科学基金资助项目（U1836118，61673004）；教育部新一代信息技术创新项目（2018A03025）；国家社会科学基金重大计划项目（11&ZD189）。

Improved AdaNet based on adaptive learning rate optimization

LIU Ran^1,2,3,4, LIU Yu^1,2,3,4, GU Jinguang^1,2,3,4

1. College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
2. Key Laboratory of Intelligent Information Processing and Real-time Industrial System in Hubei Province;(Wuhan University of Science and Technology), Wuhan Hubei 430065, China;
3. Institute of Big Data Science and Engineering, Wuhan University of Science and Technology, Wuhan Hubei 430065, China;
4. Key Laboratory of Rich-media Knowledge Organization and Service of Digital Publishing Content, National Press and Publication Administration(Wuhan University of Science and Technology), Beijing 100038, China

Received:2020-03-05 Revised:2020-05-25 Online:2020-06-24 Published:2020-10-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (U1836118, 61673004), the New Generation Information Technology Innovation Project of the Ministry of Education (2018A03025), the Major Plan of the National Social Science Foundation of China (11&ZD189).

摘要/Abstract

摘要： 人工神经网络的自适应结构学习（AdaNet）是基于Boosting集成学习的神经结构搜索框架，可通过集成子网创建高质量的模型。现有的AdaNet所产生的子网之间的差异性不显著，因而限制了集成学习中泛化误差的降低。在AdaNet设置子网网络权重和集成子网的两个步骤中，使用Adagrad、RMSProp、Adam、RAdam等自适应学习率方法来改进现有AdaNet中的优化算法。改进后的优化算法能够为不同维度参数提供不同程度的学习率缩放，得到更分散的权重分布，以增加AdaNet产生子网的多样性，从而降低集成学习的泛化误差。实验结果表明，在MNIST（Mixed National Institute of Standards and Technology database）、Fashion-MNIST、带高斯噪声的Fashion-MNIST这三个数据集上，改进后的优化算法能提升AdaNet的搜索速度，而且该方法产生的更加多样性的子网能提升集成模型的性能。在F1值这一评估模型性能的指标上，改进后的方法相较于原方法，在三种数据集上的最大提升幅度分别为0.28%、1.05%和1.10%。

关键词: AdaNet, 神经架构搜索, 集成学习, 自适应学习率方法, 自动机器学习

Abstract: AdaNet (Adaptive structural learning of artificial neural Networks) is a neural architecture search framework based on Boosting ensemble learning, which can create high-quality models through integrated subnets. The difference between subnets generated by the existing AdaNet is not significant, which limits the reduction of generalization error in ensemble learning. In the two steps of AdaNet:setting subnet network weights and integrating subnets, Adagrad, RMSProp (Root Mean Square Prop), Adam, RAdam (Rectified Adam) and other adaptive learning rate methods were used to improve the existing optimization algorithms in AdaNet. The improved optimization algorithms were able to provide different degrees of learning rate scaling for different dimensional parameters, resulting in a more dispersed weight distribution, so as to increase the diversity of subnets generated by AdaNet, thereby reducing the generalization error of ensemble learning. The experimental results show that on the three datasets:MNIST (Mixed National Institute of Standards and Technology database), Fashion-MNIST and Fashion-MNIST with Gaussian noise, the improved optimization algorithms can improve the search speed of AdaNet, and more diverse subnets generated by the method can improve the performance of the ensemble model. For the F1 value, which is an index to evaluate the model performance, compared with the original method, the improved methods have the largest improvement of 0.28%, 1.05% and 1.10% on the three datasets.

Key words: AdaNet, Neural Architecture Search (NAS), ensemble learning, adaptive learning rate method, Automated Machine Learning (AutoML)

中图分类号:

TP183

刘然, 刘宇, 顾进广. 基于自适应学习率优化的AdaNet改进[J]. 计算机应用, 2020, 40(10): 2804-2810.

LIU Ran, LIU Yu, GU Jinguang. Improved AdaNet based on adaptive learning rate optimization[J]. Journal of Computer Applications, 2020, 40(10): 2804-2810.

参考文献

[1] FREUND Y,SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences,1997,55(1):119-139.
[2] KROGH A, VEDELSBY J. Neural network ensembles, cross validation, and active learning[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,1994:231-238.
[3] LI H,XU Z,TAYLOR G,et al. Visualizing the loss landscape of neural nets[C]//Proceedings of the 2018 Advances in Neural Information Processing Systems,2018:6389-6399.
[4] KEARNS M,VALIANT L. Cryptographic limitations on learning Boolean formulae and finite automata[J]. Journal of the ACM, 1994,41(1):67-95.
[5] SCHAPIRE R E. The strength of weak learnability[J]. Machine Learning,1990,5(2):197-227.
[6] 金海东, 刘全, 陈冬火. 一种带自适应学习率的综合随机梯度下降Q-学习方法[J]. 计算机学报,2019,42(10):2203-2215. (JIN H D, LIU Q, CHEN D H. Adaptive learning-rate on integrated stochastic gradient decreasing Q-learning[J]. Chinese Journal of Computers,2019,42(10):2203-2215.)
[7] ZOPH B,LE Q V. Neural architecture search with reinforcement learning[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1611.01578.pdf.
[8] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:8697-8710.
[9] LIU C, ZOPH B, NEUMANN M, et al. Progressive neural architecture search[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11205. Cham:Springer, 2018:19-35.
[10] RUMELHART D E,HINTON G E,WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature,1986,323(6088):533-536.
[11] DUCHI J,HAZAN E,SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research,2011,12:2121-2159.
[12] MCMAHAN B, STREETER M. Delay-tolerant algorithms for asynchronous distributed online learning[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:2915-2923.
[13] KINGMA D P,BA J. Adam:a method for stochastic optimization[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1412.6980.pdf.
[14] LIU L,JIANG H,HE P,et al. On the variance of the adaptive learning rate and beyond[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1908.03265.pdf.
[15] HE K,ZHANG X,REN S,et al. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:1026-1034.
[16] WILSON D R,MARTINEZ T R. The general inefficiency of batch training for gradient descent learning[J]. Neural Networks,2003, 16(10):1429-1451.
[17] 朱振国, 田松禄. 基于权值变化的BP神经网络自适应学习率改进研究[J]. 计算机系统应用,2018,27(7):205-210.(ZHU Z G,TIAN S L. Improvement of learning rate of feed forward neural network based on weight gradient[J]. Computer Systems and Applications,2018,27(7):205-210.)
[18] RUDER S. An overview of gradient descent optimization algorithms[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1609.04747.pdf.
[19] 仝卫国, 李敏霞, 张一可. 深度学习优化算法研究[J]. 计算机科学,2018,45(11A):155-159.(TONG W G,LI M X,ZHANG Y K. Research on optimization algorithm of deep learning[J]. Computer Science,2018,45(11A):155-159.)

基于自适应学习率优化的AdaNet改进

Improved AdaNet based on adaptive learning rate optimization

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	孙仁科, 皇甫志宇, 陈虎, 李仲年, 许新征. 神经架构搜索综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2983-2994.
[2]	龙杰, 谢良, 徐海蛟. 集成的深度强化学习投资组合模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 300-310.
[3]	赵敬涛, 赵泽方, 岳兆娟, 李俊. TenrepNN：集成学习的新范式在企业自律性评价中的实践[J]. 《计算机应用》唯一官方网站, 2023, 43(10): 3107-3113.
[4]	蔡淳豪, 李建良. 小样本问题下培训弱教师网络的模型蒸馏模型[J]. 《计算机应用》唯一官方网站, 2022, 42(9): 2652-2658.
[5]	郭一阳, 于炯, 杜旭升, 杨少智, 曹铭. 基于自编码器与集成学习的离群点检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2078-2087.
[6]	李颖之, 李曼, 董平, 周华春. 基于集成学习的多类型应用层DDoS攻击检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3775-3784.
[7]	李小娟, 韩萌, 王乐, 张妮, 程浩东. 基于准确率爬坡的动态加权集成分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 123-131.
[8]	毛铭泽, 曹芮浩, 闫春钢. 基于权值多样性的半监督分类算法[J]. 计算机应用, 2021, 41(9): 2473-2480.
[9]	余东昌, 赵文芳, 聂凯, 张舸. 基于LightGBM算法的能见度预测模型[J]. 《计算机应用》唯一官方网站, 2021, 41(4): 1035-1041.
[10]	秦静, 左长青, 汪祖民, 季长清, 王宝凤. 基于堆叠分类器的心电异常监测模型设计[J]. 计算机应用, 2021, 41(3): 887-890.
[11]	罗长银, 陈学斌, 马春地, 王君宇. 面向区块链的在线联邦增量学习算法[J]. 计算机应用, 2021, 41(2): 363-371.
[12]	周超然, 赵建平, 马太, 周欣. 基于注意力机制和集成学习的网页黑名单判别方法[J]. 计算机应用, 2021, 41(1): 133-138.
[13]	顾桐, 许国良, 李万林, 李家浩, 王志愿, 雒江涛. 基于集成LightGBM和贝叶斯优化策略的房价智能评估模型[J]. 计算机应用, 2020, 40(9): 2762-2767.
[14]	刘丹, 姚立霜, 王云锋, 裴作飞. 面向类不平衡流量数据的分类模型[J]. 计算机应用, 2020, 40(8): 2327-2333.
[15]	苏俊宁, 叶东毅. 基于样本密度峰值的不平衡数据欠抽样方法[J]. 计算机应用, 2020, 40(1): 83-89.