[1] LI J Y, DENG L, GONG Y F, et al. An overview of noise-robust automatic speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(4):745-777. [2] HIMAWAN I, MOTLICEK P, IMSENG D, et al. Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition[C]//Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ:IEEE, 2015:4540-4544. [3] HAN K, HE Y Z, BAGCHI D, et al. Deep neural network based spectral feature mapping for robust speech recognition[C]//Proceedings of the 201516th Annual Conference of the International Speech Communication Association. Grenoble, France:ISCA, 2015:2484-2488. [4] REHR R, GERKMANN T. Cepstral noise subtraction for robust automatic speech recognition[C]//Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2015:375-378. [5] WANG D, ZHANG X W. THCHS-30:a free Chinese speech corpus[EB/OL].[2017-10-16]. http://pdfs.semanticscholar.org/207e/c1b9457c1e42f34d331cf2a7bc791358b9cd.pdf. [6] LIPPMANN R, MARTIN E, PAUL D. Multi-style training for robust isolated-word speech recognition[C]//Proceedings of the 2003 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2003:705-708. [7] 易克初,田斌,付强.语音信号处理[M].北京:国防工业出版社,2000:210-242.(YI K C, TIAN B, FU Q. Speech Signal Processing[M]. Beijing:National Defense Industry Press, 2000:210-242.) [8] 张仕良.基于深度神经网络的语音识别模型研究[D].合肥:中国科学技术大学,2017:1-4.(ZHANG S L. Research on deep neural network based models for speech recognition[D]. Hefei:University of Science and Technology of China, 2017:1-4.) [9] DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1):30-42. [10] GAO T, DU J, DAI L R, et al. Joint training of front-end and back-end deep neural networks for robust speech recognition[C]//Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2015:4375-4379. [11] MA L, MILNER B, SMITH D. Acoustic environment classification[J]. ACM Transactions on Speech and Language Processing, 2006, 3(2):1-22. [12] CHU S, NARAYANAN S, KUO C C J. Environmental sound recognition with time-frequency audio features[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2009, 17(6):1142-1158. [13] XUE X B, ZHOU Z H. Distributional features for text categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(3):428-442. [14] 周志华.机器学习[M].北京:清华大学出版社,2016:121-145.(ZHOU Z H. Machine Learning[M]. Beijing:Tsinghua University Press, 2016:121-145.) [15] PHILBIN J, CHUM O, ISARD M, et al. Object retrieval with large vocabularies and fast spatial matching[C]//Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2007:1-8. [16] LIANG J W, JIN Q, HE X X, et al. Semantic concept annotation of consumer videos at frame-level using audio[C]//Proceedings of the 201415th Pacific-Rim Conference on Advances in Multimedia Information Processing, LNCS 8879. Cham:Springer, 2014:113-122. [17] VESELY K, GHOSHAL A, BURGET L, et al. Sequence-discriminative training of deep neural networks[C]//Proceedings of the 201314th Annual Conference of International Speech Communication Association. Prefecture of Grenoble, France:ISCA, 2013:2345-2349. [18] 俞栋,邓力.解析深度学习:语音识别实践[M].北京:电子工业出版社,2016:81-85.(YU D, DENG L. Parsing the Deep Learning:Speech Recognition Practices[M]. Beijing:Publishing House of Electronics Industry, 2016:81-85.) |