| [1] | 蔡汉添,袁波涛. 一种基于听觉掩蔽模型的语音增强算法[J]. 通信学报, 2002, 23(8): 93-98. | 
																													
																						|  | CAI H T, YUAN B T. A speech enhancement algorithm based on masking properties of human auditory system [J]. Journal on Communications, 2002, 23(8): 93-98. | 
																													
																						| [2] | WANG Y, BROOKES M. Model-based speech enhancement in the modulation domain [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(3): 580-594. | 
																													
																						| [3] | ALMAJAI I, MILNER B. Visually derived wiener filters for speech enhancement [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(6):1642-1651. | 
																													
																						| [4] | 蓝天,彭川,李森,等. 单声道语音降噪与去混响研究综述[J]. 计算机研究与发展, 2020, 57(5): 928-953. | 
																													
																						|  | LAN T, PENG C, LI S, et al. An overview of monaural speech denoising and dereverberation research [J]. Journal of Computer Research and Development, 2020, 57(5):928-953. | 
																													
																						| [5] | KRAWCZYK-BECKER M, GERKMANN T. Fundamental frequency informed speech enhancement in a flexible statistical framework [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(5): 940-951. | 
																													
																						| [6] | WOHLMAYR M, STARK M, PERNKOPF F. A probabilistic interaction model for multipitch tracking with factorial hidden Markov models [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4):799-810. | 
																													
																						| [7] | MING J, SRINIVASAN R, CROOKES D. A corpus-based approach to speech enhancement from nonstationary noise [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(4):822-836. | 
																													
																						| [8] | PANDEY A, WANG D. TCNN: temporal convolutional neural network for real-time speech enhancement in the time domain [C]// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2019: 6875-6879. | 
																													
																						| [9] | LI A, LIU W, ZHENG C, et al. Two heads are better than one: a two-stage complex spectral mapping approach for monaural speech enhancement [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 1829-1843. | 
																													
																						| [10] | GAO T, DU J, XU Y, et al. Improving deep neural network based speech enhancement in low SNR environments [C]// Proceedings of the 2015 International Conference on Latent Variable Analysis and Signal Separation, LNCS 9237. Cham: Springer, 2015: 75-82. | 
																													
																						| [11] | YIN D, LUO C, XIONG Z, et al. PHASEN: a phase-and-harmonics-aware speech enhancement network [C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2020: 9458-9465. | 
																													
																						| [12] | LEE J, KANG H G. Two-stage refinement of magnitude and complex spectra for real-time speech enhancement [J]. IEEE Signal Processing Letters, 2022, 29: 2188-2192. | 
																													
																						| [13] | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2017: 6000-6010. | 
																													
																						| [14] | WANG K, HE B, ZHU W P. TSTNN: two-stage Transformer based neural network for speech enhancement in the time domain[C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 7098-7102. | 
																													
																						| [15] | YU G, LI A, WANG H, et al. DBT-Net: dual-branch federative magnitude and phase estimation with attention-in-attention Transformer for monaural speech enhancement [J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 2629-2644. | 
																													
																						| [16] | ZHANG Q, SONG Q, NI Z, et al. Time-frequency attention for monaural speech enhancement [C]// Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 7852-7856. | 
																													
																						| [17] | 张天骐,罗庆予,张慧芝,等. 复谱映射下融合高效Transformer的语音增强方法[J]. 信号处理, 2024, 40(2): 406-416. | 
																													
																						|  | ZHANG T Q, LUO Q Y, ZHANG H Z, et al. Speech enhancement method based on complex spectrum mapping with efficient Transformer [J]. Journal of Signal Processing, 2024, 40(2): 406-416. | 
																													
																						| [18] | HU Y, LIU Y, LV S, et al. DCCRN: deep complex convolution recurrent network for phase-aware speech enhancement [C]// Proceedings of the INTERSPEECH 2020. [S.l.]: International Speech Communication Association, 2020: 2472-2476. | 
																													
																						| [19] | ZHANG S, LEI M, YAN Z, et al. Deep-FSMN for large vocabulary continuous speech recognition [C]// Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2018: 5869-5873. | 
																													
																						| [20] | ZHAO S, MA B, WATCHARASUPAT K N, et al. FRCRN: boosting feature representation using frequency recurrence for monaural speech enhancement [C]// Proceedings of the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway: IEEE, 2022: 9281-9285. | 
																													
																						| [21] | HAN K, WANG Y, CHEN H, et al. A survey on Vision Transformer [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 87-110. | 
																													
																						| [22] | VEAUX C, YAMAGISHI J, KING S. The voice bank corpus: design, collection and data analysis of a large regional accent speech database [C]// Proceedings of the 2013 International Conference of the Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation. Piscataway: IEEE, 2013: 1-4. | 
																													
																						| [23] | THIEMANN J, ITO N, VINCENT E. The diverse environments multi-channel acoustic noise database: a database of multichannel environmental noise recordings [J]. The Journal of the Acoustical Society of America, 2013, 133(S5): No.4806631. | 
																													
																						| [24] | GAROFOLO J S, LAMEL L F, FISHER W M, et al. DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM: NISTIR 4930 [R/OL]. [2024-12-14]. . | 
																													
																						| [25] | VARGA A, STEENEKEN H J M. Assessment for automatic speech recognition: Ⅱ. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems [J]. Speech Communication, 1993, 12(3): 247-251. | 
																													
																						| [26] | HUANG H X, WU R J, HUANG J, et al. DCCRGAN: deep complex convolution recurrent generator adversarial network for speech enhancement [C]// Proceedings of the 2022 International Symposium on Electrical, Electronics and Information Engineering. Piscataway: IEEE, 2022: 30-35. | 
																													
																						| [27] | LV S, FU Y, XING M, et al. S-DCCRN: super wide band DCCRN with learnable complex feature for speech enhancement[C]// Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2022: 7767-7771. | 
																													
																						| [28] | ZHOU L, GAO Y, WANG Z, et al. Complex spectral mapping with attention based convolution recurrent neural network for speech enhancement [EB/OL]. [2024-12-22]. . | 
																													
																						| [29] | YU G, WANG Y, WANG H, et al. A two-stage complex network using cycle-consistent generative adversarial networks for speech enhancement [J]. Speech Communication, 2021, 134: 42-54. | 
																													
																						| [30] | LI Y, SUN M, ZHANG X. Scale-aware dual-branch complex convolutional recurrent network for monaural speech enhancement[J]. Computer Speech and Language, 2024, 86: No.101618. |