[1] HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition:the shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97. [2] VALENTE F, MAGIMAI-DOSS M, WANG W. Analysis and comparison of recent MLP features for LVCSR systems[EB/OL].[2017-12-11]. http://publications.idiap.ch/downloads/papers/2011/Valente_INTERSPEECH_2011.pdf. [3] DAHL G E, DONG Y, LI D, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1):30-42. [4] MOHAMED A R, HINTON G, PENN G. Understanding how deep belief networks perform acoustic modelling[C]//ICASSP 2012:Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2012:4273-4276. [5] VESELY K, GHOSHAL A, BURGET L, et al. Sequence-discriminative training of deep neural networks[EB/OL].[2017-12-11]. https://www.isca-speech.org/archive/archive_papers/interspeech_2013/i13_2345.pdf. [6] BLASIAK S, RANGWALA H. A hidden Markov model variant for sequence classification[EB/OL].[2017-12-11]. http://www.ijcai.org/Proceedings/11/Papers/203.pdf. [7] HAYASHI T, WATANABE S, TODA T, et al. Duration-controlled LSTM for polyphonic sound event detection[J]. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2017, 25(11):2059-2070. [8] SAON G, KUO H K J, RENNIE S, et al. The IBM 2015 English conversational telephone speech recognition system[EB/OL].[2017-12-11]. https://www.isca-speech.org/archive/interspeech_2015/papers/i15_3140.pdf. [9] GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[EB/OL].[2017-12-12]. http://web.stanford.edu/class/cs224s/papers/graves06.pdf. [10] MOHRI M, PEREIRA F, RILEY M. Speech recognition with weighted finite-state transducers[M]//BENESTY J, SONDHI M, HUANG Y A. Springer Handbook of Speech Processing. Berlin:Springer, 2008:559-584. [11] GRAVES A, MOHAMED A R, HINTON G. Speech recognition with deep recurrent neural networks[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2013:6645-6649. [12] MORILLOT O, LIKEFORMANSULEM L. New baseline correction algorithm for text-line recognition with bidirectional recurrent neural networks[J]. Journal of Electronic Imaging, 2013, 22(2):023028. [13] WOLLMER M, SCHULLER B, EYBEN F, et al. Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening[J]. IEEE Journal of Selected Topics in Signal Processing, 2010, 4(5):867-881. [14] SAINATH T N, VINYALS O, SENIOR A, et al. Convolutional, long short-term memory, fully connected deep neural networks[C]//Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ:IEEE, 2015:4580-4584. [15] MAAS A, XIE Z, DAN J, et al. Lexicon-free conversational speech recognition with neural networks[EB/OL].[2017-12-15]. http://www.stanfordlibrary.us/~jurafsky/pubs/N15-1038.pdf. [16] HANNUN A, CASE C, CASPER J, et al. Deep speech:scaling up end-to-end speech recognition[EB/OL].[2017-12-15]. http://web.stanford.edu/class/cs224s/papers/baidu_speech.pdf. [17] POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi speech recognition toolkit[EB/OL].[2017-12-20]. http://homepages.inf.ed.ac.uk/aghoshal/pubs/asru11-kaldi.pdf. [18] DROSTE M, KUICH W, VOGLER H. Handbook of weighted automata[J]. Monographs in Theoretical Computer Science An Eatcs, 2009, 380(1/2):69-86. |