[1] MIAO Y,GOWAYYED M,METZE F,et al. EESEN:end-to-end speech recognition using deep RNN models and WFST-based decoding[C]//Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway:IEEE,2015:167-174. [2] WATANABE S,HORI T,KARITA S,et al. ESPnet:end-to-end speech processing toolkit[EB/OL].[2019-05-17]. https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1456.pdf. [3] POVEY D,GHOSHAL A,BOULIANNE G,et al. The Kaldi speech recognition toolkit[EB/OL].[2019-05-17]. http://rmozone.com/snapshots/2015/07/cdg-room-refs/2011_asru_kaldi.pdf. [4] BAHDANAU D,CHO K,BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2019-05-19]. https://arxiv.org/pdf/1409.0473.pdf. [5] KALCHBRENNER N, ESPEHOLT L, SIMONYAN K, et al. Neural machine translation in linear time[EB/OL].[2019-05-17]. https://arxiv.org/pdf/1610.10099.pdf. [6] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems. New York:Curran Associates Inc.,2017:6000-6010. [7] LEVY T,SILBER-VAROD V,MOYAL A. The effect of pitch, intensity and pause duration in punctuation detection[C]//Proceedings of the IEEE 27th Convention of Electrical and Electronics Engineers in Israel. Piscataway:IEEE,2012:1-4. [8] CHO E,NIEHUES J,WAIBEL A. NMT-based segmentation and punctuation insertion for real-time spoken language translation[EB/OL].[2019-12-02]. https://www.isca-speech.org/archive/Interspeech_2017/pdfs/1320.PDF. [9] YI J,TAO J. Self-attention based model for punctuation prediction using word and speech embeddings[C]//Proceedings of the 2019 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE,2019:7270-7274. [10] CHE X,WANG C,YANG H,et al. Punctuation prediction for unsegmented transcript based on word vector[C]//Proceedings of the 10th International Conference on Language Resources and Evaluation. Stroudsburg, PA:Association for Computational Linguistics,2016:654-658. [11] TIKL O,ALUMÄE T. LSTM for punctuation restoration in speech transcripts[EB/OL].[2019-05-19]. https://www.isca-speech.org/archive/interspeech_2015/papers/i15_0683.pdf. [12] TILK O,ALUMÄE T. Bidirectional recurrent neural network with attention mechanism for punctuation restoration[EB/OL].[2019-05-17]. https://www.isca-speech.org/archive/Interspeech_2016/pdfs/1517.PDF. [13] 李雅昆, 潘晴,WANG E X. 基于改进的多层BLSTM的中文分词和标点预测[J]. 计算机应用,2018,38(5):1278-1282, 1314. (LI Y K,PAN Q,WANG E X. Joint Chinese word segmentation and punctuation prediction based on improved multilayer BLSTM network[J]. Journal of Computer Applications, 2018,38(5):1278-1282,1314.). [14] CHO E,HA T L,WAIBEL A. CRF-based disfluency detection using semantic features for German to English spoken language translation[EB/OL].[2019-05-19]. http://www.mt-archive.info/10/IWSLT-2013-Cho.pdf. [15] ZAYATS V, OSTENDORF M, HAJISHIRZI H. Disfluency detection using a bidirectional LSTM[EB/OL].[2019-05-19]. https://www.isca-speech.org/archive/Interspeech_2016/pdfs/1247.PDF. [16] LOU P J,ANDERSON P,JOHNSON M. Disfluency detection using auto-correlational neural networks[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics,2018:4610-4619. [17] SARMA A,PALMER D D. Context-based speech recognition error detection and correction[C]//Proceedings of the 2004 Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg,PA:Association for Computational Linguistics,2004:85-88. [18] GUO J,SAINATH T N,WEISS R J. A spelling correction model for end-to-end speech recognition[C]//Proceedings of the 2019 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE,2019:5651-5655. [19] XIE Z,AVATI A,ARIVAZHAGAN N,et al. Neural language correction with character-based attention[EB/OL].[2019-05-19]. https://arxiv.org/pdf/1603.09727.pdf. [20] DEVLIN J,CHANG M,LEE K,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg,PA:Association for Computational Linguistics,2019:4171-4186. [21] LIN T Y,GOYAL P,GIRSHICK R,et al. Focal Loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2999-3007. [22] PAPINENI K,ROUKOS S,WARD T,et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2002:311-318. |