1 |
JELINEK F. Continuous speech recognition by statistical methods [J]. Proceedings of the IEEE, 1976, 64(4): 532-556.
|
2 |
HANNUN A, CASE C, CASPER J, et al. Deep Speech: scaling up end-to-end speech recognition [EB/OL]. [2023-03-04]. .
|
3 |
GRAVES A, FERNÁNDEZ S, GOMEZ F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks [C]// Proceedings of the 23rd International Conference on Machine Learning. New York: ACM, 2006: 369-376.
|
4 |
SHAN C, ZHANG J, WANG Y, et al. Attention-based end-to-end speech recognition on voice search [C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 4764-4768.
|
5 |
GRAVES A. Sequence transduction with recurrent neural networks [EB/OL]. [2021-11-08]. .
|
6 |
GONG C, TAN X, HE D, et al. Sentence-wise smooth regularization for sequence to sequence learning [C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2019: 6449-6456.
|
7 |
SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2016: 1715-1725.
|
8 |
BAZZI I. Modeling out-of-vocabulary words for robust speech recognition [D]. Cambridge: Massachusetts Institute of Technology, 2002: 47-79.
|
9 |
WANG C, CHO K, GU J. Neural machine translation with byte-level subwords [C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 9154-9160.
|
10 |
DENG L, HSIAO R, GHOSHAL A. Bilingual end-to-end ASR with byte-level subwords [C]// Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 6417-6421.
|
11 |
YAO Z, GUO L, YANG X, et al. Zipformer: a faster and better encoder for automatic speech recognition [EB/OL]. [2024-06-20]. .
|
12 |
SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks [C]// Proceedings of the 27th International Conference on Neural Information Processing Systems — Volume 2. Cambridge: MIT Press, 2014: 3104-3112.
|
13 |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010.
|
14 |
BU H, DU J, NA X, et al. AISHELL-1: an open-source Mandarin speech corpus and a speech recognition baseline [C]// Proceedings of the 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment. Piscataway: IEEE, 2017: 1-5.
|
15 |
GULATI A, QIN J, CHIU C C, et al. Conformer: convolution-augmented transformer for speech recognition [C]// Proceedings of the INTERSPEECH 2022. [S.l.]: International Speech Communication Association, 2022: 5036-5040.
|
16 |
许鸿奎,卢江坤,张子枫,等.结合Conformer与N-gram的中文语音识别[J].计算机系统应用, 2022, 31(7): 194-202.
|
|
XU H K, LU J K, ZHANG Z F, et al. Chinese speech recognition combining Conformer and N-gram [J]. Computer Systems and Applications, 2022, 31(7): 194-202.
|
17 |
BA J L, KIROS J R, HINTON G E. Layer normalization [EB/OL]. [2023-05-03]. .
|
18 |
KINGMA D P, BA J L. Adam: a method for stochastic optimization [EB/OL]. [2023-04-18]. .
|
19 |
GHODSI M, LIU X, APFEL J, et al. RNN-Transducer with stateless prediction network [C]// Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 7049-7053.
|
20 |
陈戈,谢旭康,孙俊,等.使用Conformer增强的混合CTC/Attention端到端中文语音识别[J].计算机工程与应用, 2023, 59(4): 97-103.
|
|
CHEN G, XIE X K, SUN J, et al. Hybrid CTC/Attention end-to-end Chinese speech recognition enhanced by Conformer [J]. Computer Engineering and Applications, 2023, 59(4): 97-103.
|
21 |
POVEY D, GHOSHAL A, BOULIANNE G, et al. The Kaldi speech recognition toolkit [EB/OL]. [2023-04-18]. .
|
22 |
杭州电子科技大学.基于Fbank特征和MFCC特征融合的声纹识别方法: 202110586134.6 [P]. 2021-09-14.
|
|
Hangzhou Dianzi University. Method for voiceprint recognition based on fusion of Fbank features and MFCC features: 202110586134.6 [P]. 2021-09-14.
|
23 |
KO T, PEDDINTI V, POVEY D, et al. Audio augmentation for speech recognition [C]// Proceedings of the INTERSPEECH 2015. [S.l.]: International Speech Communication Association, 2015: 3586-3589.
|
24 |
PARK D S, CHAN W, ZHANG Y, et al. SpecAugment: a simple data augmentation method for automatic speech recognition [C]// Proceedings of the INTERSPEECH 2019. [S.l.]: International Speech Communication Association, 2019: 2613-2617.
|
25 |
MICIKEVICIUS P, NARANG S, ALBEN J, et al. Mixed precision training [EB/OL]. [2023-06-18]. .
|
26 |
JAIN M, SCHUBERT K, MAHADEOKAR J, et al. RNN-T for latency controlled ASR with improved beam search [EB/OL]. [2023-10-21]. .
|
27 |
WAIBEL A, HANAZAWA T, HINTON G, et al. Phoneme recognition using time-delay neural networks [M]// CHAUVIN Y, RUMELHART D E. Backpropagation: theory, architectures, and applications. New York: Psychology Press, 1995: 35-61.
|