[1] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors [J]. Nature, 1986,323(9):533-536. [2] BALDI P, HORNIK K. Neural networks and principal component analysis: learning from examples without local minima [J]. Neural Networks, 1989,2(1):53-58. [3] BENGIO Y, LAMBLIN P, POPOVICI D, et al. Personal communications with Will Zou. learning optimization Greedy layer-wise training of deep networks [C]//Proceedings of the 20th Annual Conference on Neural Information Processing System. Cambridge, MA: MIT Press, 2006:153-160. [4] BENGIO Y. Learning deep architectures for AI [J]. Foundations & Trends® in Machine Learning, 2009, 2(1): 1-127. [5] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders [C]//Proceedings of the 2008 25th International Conference on Machine Learning. New York: ACM, 2008: 1096-1103. [6] CHEN M, WEINBERGER K, SHA F, et al. Marginalized denoising auto-encoders for nonlinear representations [C]//Proceedings of the 2014 31th International Conference on Machine Learning. New York: ACM, 2014: 1476-1484. [7] VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion [J]. Journal of Machine Learning Research, 2010,11(6): 3371-3408. [8] LeCUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. [9] FARABET C, COUPRIE C, NAJMAN L, et al. Learning hierarchical features for scene labeling [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1915-1929. [10] MOHAMED A, DAHL G E, HINTON G. Acoustic modeling using deep belief networks [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 14-22. [11] LeCUN Y, BOTTOU L, ORR G B, et al. Efficient BackProp [M]//ORR G B, MVLLER K-R. Neural Networks: Tricks of the Trade, LNCS 1524. Berlin: Springer, 1998:9-50. [12] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets [J]. Neural Computation, 2006,18(7): 1527-1554. [13] JAITLY N, HINTON G E. Using an autoencoder with deformable templates to discover features for automated speech recognition [EB/OL]. [2015-04-07]. http://www.cs.toronto.edu/~ndjaitly/jaitly-interspeech13.pdf. [14] TSURUOKA Y, TSUJII J, ANANIADOU S. Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty [EB/OL]. [2015-04-07]. http://aye.comp.nus.edu.sg/~antho/P/P09/P09-1054.pdf. [15] LE Q V, NGIAM J, COATES A, et al. On optimization methods for deep learning [EB/OL]. [2015-04-07]. http://ai.stanford.edu/~ang/papers/icml11-OptimizationForDeepLearning.pdf. |