[1] FREUND Y,SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences,1997,55(1):119-139. [2] KROGH A, VEDELSBY J. Neural network ensembles, cross validation, and active learning[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,1994:231-238. [3] LI H,XU Z,TAYLOR G,et al. Visualizing the loss landscape of neural nets[C]//Proceedings of the 2018 Advances in Neural Information Processing Systems,2018:6389-6399. [4] KEARNS M,VALIANT L. Cryptographic limitations on learning Boolean formulae and finite automata[J]. Journal of the ACM, 1994,41(1):67-95. [5] SCHAPIRE R E. The strength of weak learnability[J]. Machine Learning,1990,5(2):197-227. [6] 金海东, 刘全, 陈冬火. 一种带自适应学习率的综合随机梯度下降Q-学习方法[J]. 计算机学报,2019,42(10):2203-2215. (JIN H D, LIU Q, CHEN D H. Adaptive learning-rate on integrated stochastic gradient decreasing Q-learning[J]. Chinese Journal of Computers,2019,42(10):2203-2215.) [7] ZOPH B,LE Q V. Neural architecture search with reinforcement learning[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1611.01578.pdf. [8] ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE,2018:8697-8710. [9] LIU C, ZOPH B, NEUMANN M, et al. Progressive neural architecture search[C]//Proceedings of the 2018 European Conference on Computer Vision,LNCS 11205. Cham:Springer, 2018:19-35. [10] RUMELHART D E,HINTON G E,WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature,1986,323(6088):533-536. [11] DUCHI J,HAZAN E,SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research,2011,12:2121-2159. [12] MCMAHAN B, STREETER M. Delay-tolerant algorithms for asynchronous distributed online learning[C]//Proceedings of the 7th International Conference on Neural Information Processing Systems. Cambridge:MIT Press,2014:2915-2923. [13] KINGMA D P,BA J. Adam:a method for stochastic optimization[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1412.6980.pdf. [14] LIU L,JIANG H,HE P,et al. On the variance of the adaptive learning rate and beyond[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1908.03265.pdf. [15] HE K,ZHANG X,REN S,et al. Delving deep into rectifiers:Surpassing human-level performance on ImageNet classification[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Piscataway:IEEE,2015:1026-1034. [16] WILSON D R,MARTINEZ T R. The general inefficiency of batch training for gradient descent learning[J]. Neural Networks,2003, 16(10):1429-1451. [17] 朱振国, 田松禄. 基于权值变化的BP神经网络自适应学习率改进研究[J]. 计算机系统应用,2018,27(7):205-210.(ZHU Z G,TIAN S L. Improvement of learning rate of feed forward neural network based on weight gradient[J]. Computer Systems and Applications,2018,27(7):205-210.) [18] RUDER S. An overview of gradient descent optimization algorithms[EB/OL].[2020-02-25]. https://arxiv.org/pdf/1609.04747.pdf. [19] 仝卫国, 李敏霞, 张一可. 深度学习优化算法研究[J]. 计算机科学,2018,45(11A):155-159.(TONG W G,LI M X,ZHANG Y K. Research on optimization algorithm of deep learning[J]. Computer Science,2018,45(11A):155-159.) |