1 |
高长丰, 程高峰, 张鹏远. 面向鲁棒自动语音识别的一致性自监督学习方法[J]. 声学学报, 2023, 48(3): 578-587.
|
|
GAO C F, CHENG G F, ZHANG P Y. Consistency self-supervised learning method for robust automatic speech recognition[J]. Acta Acustica, 2023, 48(3): 578-587.
|
2 |
ZHONG X, DAI Y, DAI Y, et al. Study on processing of wavelet speech denoising in speech recognition system[J]. International Journal of Speech Technology, 2018, 21: 563-569.
|
3 |
PENG R, TAN Z-H, LI X, et al. A perceptually motivated LP residual estimator in noisy and reverberant environments[J]. Speech Communication, 2018, 96: 129-141.
|
4 |
HU Y, LOIZOU P C. A generalized subspace approach for enhancing speech corrupted by colored noise[J]. IEEE Transactions on Speech and Audio Processing, 2003, 11(4): 334-341.
|
5 |
蓝天, 彭川, 李森, 等. 单声道语音降噪与去混响研究综述[J]. 计算机研究与发展, 2020, 57(5): 928-953.
|
|
LAN T, PENG C, LI S, et al. An overview of monaural speech denoising and dereverberation research[J]. Journal of Computer Research and Development, 2020, 57(5): 928-953.
|
6 |
LUO Y, MESGARANI N. TaSNET: time-domain audio separation network for real-time, single-channel speech separation[C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 696-700.
|
7 |
GAO T, DU J, DAI L-R, et al. Densely connected progressive learning for LSTM-based speech enhancement[C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 5054-5058.
|
8 |
ROUTRAY S, MAO Q. Phase sensitive masking-based single channel speech enhancement using conditional generative adversarial network[J]. Computer Speech & Language, 2022, 71: 101270.
|
9 |
YU W, ZHOU J, WANG H B, et al. SETransformer: speech enhancement Transformer[J]. Cognitive Computation, 2022,14(3):1152-1158.
|
10 |
FU S-W, LIAO C-F, TASO Y, et al. MetricGAN: generative adversarial networks based black-box metric scores optimization for speech enhancement[C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR.org, 2019: 2031-2041.
|
11 |
ZHANG Z, DENG C, SHEN Y, et al. On loss functions and recurrency training for GAN-based speech enhancement systems[C]// Proceedings of the 2020 Interspeech. Baixas, France: International Speech Communication Association, 2020: 3266-3270.
|
12 |
NIKZAD M, NICOLSON A, GAO Y, et al. Deep residual-dense lattice network for speech enhancement[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8552-8559.
|
13 |
PASCUAL S, BONAFONTE A, SERRÀ J. SEGAN: speech enhancement generative adversarial network[C]// Proceedings of the 2017 Interspeech. Baixas, France: International Speech Communication Association, 2017: 3642-3646.
|
14 |
KIM E, SEO H. SE-Conformer: time-domain speech enhancement using conformer[EB/OL].[2023-06-20]..
|
15 |
WANG K, HE B, ZHU W-P. TSTNN: two-stage transformer based neural network for speech enhancement in the time domain[C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 7098-7102.
|
16 |
TAN K, WANG D L. Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019, 28: 380-390.
|
17 |
H-S CHOI, KIM J-H, HUH J, et al. Phase-aware speech enhancement with deep complex U-net[C/OL]// Proceedings of the 2019 International Conference on Learning Representations ( 2019-03-07)[2023-08-01]. .
|
18 |
HU Y, LIU Y, LV S, et al. DCCRN: deep complex convolution recurrent network for phase-aware speech enhancement[C]// Proceedings of the 2020 Interspeech. Baixas, France: International Speech Communication Association, 2020: 2472-2476.
|
19 |
LI A, ZHENG C, FAN C, et al. A recursive network with dynamic attention for monaural speech enhancement[C]// Proceedings of the 2020 Interspeech. Baixas, France: International Speech Communication Association, 2020: 2422-2426.
|
20 |
DÉFOSSEZ A, SYNNAEVE G, ADI Y. Real time speech enhancement in the waveform domain [C]]// Proceedings of the 2020 Interspeech. Baixas, France: International Speech Communication Association, 2020: 3291-3295.
|
21 |
HUANG Z, WATANABE S, YANG S-W, et al. Investigating self-supervised learning for speech enhancement and separation[C]// Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 6837-6841.
|
22 |
LI A, LIU W, ZHENG C, et al. Two heads are better than one: a two-stage complex spectral mapping approach for monaural speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 1829-1843.
|
23 |
HAO X, SU X, WEN S, et al. Masking and inpainting: a two-stage speech enhancement approach for low SNR and non-stationary noise[C]// Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2020: 6959-6963.
|
24 |
WANG H, WANG D L. Neural cascade architecture with triple-domain loss for speech enhancement[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 30: 734-743.
|
25 |
范君怡, 杨吉斌, 张雄伟, 等. U-net网络中融合多头注意力机制的单通道语音增强[J]. 声学学报, 2022, 47(6): 703-716.
|
|
FAN J Y, YANG J B, ZHANG X W, et al. Monaural speech enhancement using U-net fused with multi-head self-attention[J]. Acta Acustica, 2022, 47(6): 703-716.
|
26 |
JU Y, RAO W, YAN X, et al. TEA-PSE: Tencent-ethereal-audio-lab personalized speech enhancement system for ICASSP 2022 DNS challenge[C]// Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 9291-9295.
|
27 |
CHEN S, WANG C, CHEN Z,et al. WavLM: large-scale self-supervised pre-training for full stack speech processing[J]. IEEE Journal of Selected Topics in Signal Processing, 2022, 16(6):1505-1518.
|
28 |
WOO S, PARK J, LEE J-Y, et al. CBAM: convolutional block attention module[C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19.
|
29 |
VEAUX C, YAMAGISHI J, KING S. The voice bank corpus: design, collection and data analysis of a large regional accent speech database[C]// Proceedings of the 2013 International Conference Oriental COCOSDA Held Jointly with Conference on Asian Spoken Language Research and Evaluation. Piscataway: IEEE, 2013: 1-4.
|
30 |
THIEMANN J, ITO N, VINCENT E. The diverse environments multi-channel acoustic noise database (DEMAND): a database of multichannel environmental noise recordings[J]. Proceedings of Meetings on Acoustics, 2013, 19(1): 035081.
|
31 |
MACARTNEY C, WEYDE T. Improved speech enhancement with the Wave-U-Net[EB/OL]. (2018-11-27) [2022-12-15]..
|
32 |
LI A, ZHENG C, ZHANG L, et al. Glance and gaze: a collaborative learning framework for single-channel speech enhancement[J]. Applied Acoustics, 2022, 187: 108499.
|
33 |
余本年,詹永照,毛启容,等.面向语音增强的双复数卷积注意聚合递归网络[J].计算机应用, 2023, 43(10): 3217-2124.
|
|
YU B N, ZHAN Y Z, MAO Q R, et al. Double complex convolutional and attention aggregating recurrent network for speech enhancement[J]. Journal of Computer Applications, 2023, 43(10): 3217-2124.
|