Monaural speech enhancement based on gated dilated convolutional recurrent network

doi:10.11772/j.issn.1001-9081.2023040452

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (4): 1317-1324.DOI: 10.11772/j.issn.1001-9081.2023040452

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles

Monaural speech enhancement based on gated dilated convolutional recurrent network

Xinyuan YOU, Heng WANG()

School of Mathematics & Computer Science，Wuhan Polytechnic University，Wuhan Hubei 430048，China

Received:2023-04-21 Revised:2023-07-06 Accepted:2023-07-10 Online:2023-12-04 Published:2024-04-10
Contact: Heng WANG
About author:YOU Xinyuan， born in 1998， M. S. candidate. Her research interests include monaural speech enhancement.
Supported by:
Key Project of Scientific Research Plan of Hubei Provincial Department of Education(D20201601);Hubei Key Laboratory of Intelligent Robot （Wuhan Institute of Technology） Open Fund(HBIR202101)

基于门控膨胀卷积循环网络的单声道语音增强

尤昕源, 王恒()

武汉轻工大学数学与计算机学院，武汉 430048

通讯作者: 王恒
作者简介:尤昕源（1998—），女，河南洛阳人，硕士研究生，CCF会员，主要研究方向：单声道语音增强； ∗
王恒（1983—），男，湖北武汉人，副教授，博士，主要研究方向：声学空间参数的感知特征、人工智能、3D音频和视频在虚拟现实中的应用。wh825554@163.com
基金资助:
湖北省教育厅科学研究计划重点项目(D20201601);武汉工程大学智能机器人湖北省重点实验室开放基金资助项目(HBIR202101)

Abstract

Abstract:

The use of contextual information plays an important role in speech enhancement tasks. To address the under-utilization problem of global speech， a Gated Dilated Convolutional Recurrent Network （GDCRN） for complex spectral mapping was proposed. GDCRN was composed of an encoder， a Gated Temporal Convolution Module （GTCM） and a decoder. The encoder and decoder had asymmetric network structure. Firstly， features were processed by the encoder using a Gated Dilated Convolution Module （GDCM）， which expanded the receptive field. Secondly， longer contextual information was captured and selectively passed through the use of the GTCM. Finally， the deconvolution combined with a Gated Linear Unit （GLU）was used by the decoder， which was connected to the corresponding convolution layer in the encoder using skip connection. Additionally， a Channel Time-Frequency Attention （CTFA） mechanism was introduced. Experimental results show that the proposed network has fewer parameters and shorter training time than other networks such as Temporal Convolutional Neural Network （TCNN） and Gated Convolutional Recurrent Network （GCRN）. The proposed GDCRN significantly improves PESQ （Perceptual Evaluation of Speech Quality） and STOI（Short-Time Objective Intelligibility） up by 0.258 9 and 4.67 percentage points， demonstrating that the proposed network has better enhancement effect and stronger generalization ability.

Key words: speech enhancement, complex spectral mapping, dilated convolution, gating mechanism, attention mechanism

摘要：

上下文信息的使用在语音增强任务中具有重要作用。针对全局语音利用不充分的问题，提出一种用于复数频谱映射的门控膨胀卷积循环网络（GDCRN）。GDCRN包含编码器、门控时间卷积模块（GTCM）和解码器这3部分，编码器和解码器是非对称的网络结构。首先，编码器利用门控膨胀卷积模块（GDCM）扩大感受野，处理特征；其次，使用GTCM捕获更长的上下文信息，并选择性传递特征；最后，解码器使用结合门控线性单元（GLU）的反卷积，反卷积与编码器中对应层的卷积层使用跳跃连接，并引入通道时频注意力（CTFA）机制。实验结果表明，相较于时间卷积神经网络（TCNN）、门控卷积循环网络（GCRN）等网络，所提网络的参数量和训练时间更少，客观语音质量评估（PESQ）和短时客观可懂度（STOI）都有显著改善，最高可提升0.258 9和4.67个百分点，具有更好的增强效果与更强的泛化能力。

关键词: 语音增强, 复数频谱映射, 膨胀卷积, 门控机制, 注意力机制

CLC Number:

TN912.35

Xinyuan YOU, Heng WANG. Monaural speech enhancement based on gated dilated convolutional recurrent network[J]. Journal of Computer Applications, 2024, 44(4): 1317-1324.

尤昕源, 王恒. 基于门控膨胀卷积循环网络的单声道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1317-1324.

Figures/Tables 9

References 37

1	蓝天，彭川，李森，等. 单声道语音降噪与去混响研究综述［J］.计算机研究与发展，2020，57（5）：928-953. 10.7544/issn1000-1239.2020.20190306
	LAN T， PENG C， LI S， et al. An overview of monaural speech denoising and dereverberation research［J］. Journal of Computer Research and Development， 2020，57（5）：928-953. 10.7544/issn1000-1239.2020.20190306
2	ANDERSEN K T， MOONEN M. Robust speech-distortion weighted interframe Wiener filters for single-channel noise reduction［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（1）： 97-107. 10.1109/taslp.2017.2761699
3	WANG Y， BROOKES M. Model-based speech enhancement in the modulation domain［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2018， 26（3）： 580-594. 10.1109/taslp.2017.2786863
4	张青，吴进.基于多窗谱估计的改进维纳滤波语音增强［J］.计算机应用与软件，2017，34（3）：67-70， 118. 10.3969/j.issn.1000-386x.2017.03.011
	ZHANG Q， WU J. Improved Wiener filter speech enhancement based on multi-taper spectrum estimation［J］. Computer Applications and Software， 2017，34（3）：67-70， 118. 10.3969/j.issn.1000-386x.2017.03.011
5	ZHONG X， DAI Y， DAI Y， et al. Study on processing of wavelet speech denoising in speech recognition system［J］. International Journal of Speech Technology， 2018， 21： 563-569. 10.1007/s10772-018-9516-7
6	MARTIN R. Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors［C］// Proceedings of the 2002 IEEE International Conference on Acoustics， Speech， and Signal Processing. Piscataway： IEEE， 2002： I-253-I-256. 10.1109/icassp.2002.1005724
7	FARAJI N， KOHANSAL A. MMSE and maximum a posteriori estimators for speech enhancement in additive noise assuming a t-location-scale clean speech prior［J］. IET Signal Processing， 2018， 12（4）： 532-543. 10.1049/iet-spr.2017.0446
8	DIVENYI P. Speech Separation by Humans and Machines［M］. ［S.l.］： Kluwer Academic Publishers， 2005： 181-197. 10.1007/b99695
9	WANG Y， NARAYANAN A， WANG D L. On training targets for supervised speech separation［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2014， 22（12）： 1849-1858. 10.1109/taslp.2014.2352935
10	MOWLAEE P， SAEIDI R， MARTIN R. Phase estimation for signal reconstruction in single-channel speech separation［C］// Proceedings of the Interspeech 2012. Grenoble， France： International Speech Communication Association， 2012： 1548-1551. 10.21437/interspeech.2012-436
11	ERDOGAN H， HERSHEY J R， WATANABE S， et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks［C］// Proceedings of the 2015 IEEE International Conference on Acoustics， Speech， and Signal Processing. Piscataway： IEEE， 2015： 708-712. 10.1109/icassp.2015.7178061
12	WILLIAMSON D S， WANG Y， WANG D. Complex ratio masking for monaural speech separation［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2016， 24（3）： 483-492. 10.1109/taslp.2015.2512042
13	XU Y， DU J， DAI L-R， et al. An experimental study on speech enhancement based on deep neural networks［J］. IEEE Signal Processing Letters， 2014， 21（1）： 65-68. 10.1109/lsp.2013.2291240
14	LU X， TSAO Y， MATSUDA S， et al. Speech enhancement based on deep denoising autoencoder［C］// Proceedings of the Interspeech 2013. Grenoble， France： International Speech Communication Association， 2013： 436-440. 10.21437/interspeech.2013-130
15	ZHOU L， GAO Y， WANG Z， et al. Complex spectral mapping with attention based convolution recurrent neural network for speech enhancement［EB/OL］.［2023-04-01］. .
16	XU Y， DU J， DAI L-R， et al. A regression approach to speech enhancement based on deep neural networks［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2015， 23（1）： 7-19. 10.1109/taslp.2014.2364452
17	PARK S R， LEE J. A fully convolutional neural network for speech enhancement［EB/OL］. （2016-09-22）［2023-04-01］. . 10.21437/interspeech.2017-1465
18	GERS F A， SCHMIDHUBER J， CUMMINS F. Learning to forget： continual prediction with LSTM［J］. Neural Computation， 2000， 12（10）： 2451-2471. 10.1162/089976600300015015
19	SALEEM N， KHATTAK M I， AL-HASAN M， et al. On learning spectral masking for single channel speech enhancement using feedforward and recurrent neural networks［J］. IEEE Access， 2020， 8： 160581-160595. 10.1109/access.2020.3021061
20	CHEN J， WANG D L. Long short-term memory for speaker generalization in supervised speech separation［J］. The Journal of the Acoustical Society of America， 2017， 141（6）： 4705-4714. 10.1121/1.4986931
21	LI X， LI Y， DONG Y， et al. Bidirectional LSTM network with ordered neurons for speech enhancement［C］// Proceedings of the Interspeech 2020. Grenoble， France： International Speech Communication Association， 2020： 2702-2706. 10.21437/interspeech.2020-2245
22	TAN K， WANG D L. A convolutional recurrent neural network for real-time speech enhancement［C］// Proceedings of the Interspeech 2018. Grenoble， France： International Speech Communication Association， 2018： 3229-3233. 10.21437/interspeech.2018-1405
23	TAAL C H， HENDRIKS R C， HEUSDENS R， et al. A short-time objective intelligibility measure for time-frequency weighted noisy speech［C］// Proceedings of the 2010 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2010： 4214-4217. 10.1109/icassp.2010.5495701
24	RIX A W， BEERENDS J G， HOLLIER M P， et al. Perceptual Evaluation of Speech Quality （PESQ） — a new method for speech quality assessment of telephone networks and codecs［C］// Proceedings of the 2001 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2001： 749-752.
25	HU Y， LIU Y， LV S， et al. DCCRN： deep complex convolution recurrent network for phase-aware speech enhancement［EB/OL］. （2020-08-01）［2023-04-01］. . 10.21437/interspeech.2020-2537
26	KRAWCZYK M， GERKMANN T. STFT phase reconstruction in voiced speech for an improved single-channel speech enhancement［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2014， 22（12）： 1931-1940. 10.1109/taslp.2014.2354236
27	TAN K， WANG D L. Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement［J］. IEEE/ACM Transactions on Audio， Speech， and Language Processing， 2019， 28： 380-390. 10.1109/taslp.2019.2955276
28	DAUPHIN Y N， FAN A， AULI M， et al. Language modeling with gated convolutional networks［C］// Proceedings of the 34th International Conference on Machine Learning. New York： JMLR.org， 2017： 933-941.
29	WANG P， CHEN P， YUAN Y， et al. Understanding convolution for semantic segmentation［C］// Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2018： 1451-1460. 10.1109/wacv.2018.00163
30	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
31	WOO S， PARK J， LEE J-Y， et al. CBAM： convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01234-2_1
32	BAI S， KOLTER J Z， KOLTUN V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling［EB/OL］. （2018-03-04）［2023-04-01］. .
33	武汉轻工大学.基于注意力的复数卷积神经网络语音增强方法及系统： 202211448140.6［P］. 2022-11-18.
	Wuhan Polytechnic University. Attention-based speech enhancement method and system for complex convolutional neural networks： 202211448140.6 ［P］. 2022-11-18.
34	GAROFOLO J， GRAFF D， PAUL D， et al. CSR-I （WSJ0） Complete LDC93S6A［R］. Philadelphia： Linguistic Data Consortium， 1993.
35	VARGA A， STEENEKEN H J M. Assessment for automatic speech recognition： Ⅱ. NOISEX-92： a database and an experiment to study the effect of additive noise on speech recognition systems［J］. Speech Communication， 1993， 12（3）： 247-251. 10.1016/0167-6393(93)90095-3
36	PANDEY A， WANG D L. TCNN： temporal convolutional neural network for real-time speech enhancement in the time domain［C］// Proceedings of the 2019 IEEE International Conference on Acoustics， Speech， and Signal Processing. Piscataway： IEEE， 2019： 6875-6879. 10.1109/icassp.2019.8683634
37	LI A， ZHENG C， FAN C， et al. A recursive network with dynamic attention for monaural speech enhancement［EB/OL］. ［2023-04-01］. . 10.21437/interspeech.2020-1513

层	输入维度	参数			输出维度
层	输入维度	k	s	d	输出维度
dilated conv2d_1（×2）	2×T×161	（3，3）	（1，2）	1，5	16×T×80
dilated conv2d_2（×2）	32×T×80	（3，3）	（1，2）	2，2	32×T×39
dilated conv2d_3（×2）	64×T×39	（3，3）	（1，2）	5，1	32×T×19
dilated conv2d_4（×2）	64×T×19	（3，3）	（1，2）	1，5	64×T×9
dilated conv2d_5（×2）	128×T×9	（3，3）	（1，2）	2，2	64×T×4
dilated conv2d_6（×2）	128×T×4	（3，3）	（1，2）	5，1	128×T×4
GTCM	256×T×4	（1，1），（3，3）	（1，1），（1，2）	1	256×T×4
	256×T×4	（3，1）	（1，1）	1	256×T×4
		（3，1）	（1，1）	1
		（3，1）	（1，1）	2
		（3，1）	（1，1）	2
		（3，1）	（1，1）	4
		（3，1）	（1，1）	4
		（3，1）	（1，1）	8
		（3，1）	（1，1）	8
	256×T×4	（1，1），（3，3）	（1，1），（1，2）	1	256×T×4
deconv2d_glu_6（×2）	512×T×4	（3，1）	（1，1）	1	128×T×4
deconv2d_glu_5（×2）	256×T×4	（3，1）	（1，2）	1	128×T×9
deconv2d_glu_4（×2）	256×T×9	（3，1）	（1，2）	1	64×T×19
deconv2d_glu_3（×2）	128×T×9	（3，1）	（1，2）	1	64×T×39
deconv2d_glu_2（×2）	128×T×39	（3，1）	（1，2）	1	32×T×80
deconv2d_glu_1（×2）	64×T×80	（3，1）	（1，2）	1	1×T×161
linear（×2）	1×T×161	—	—	—	1×T×161

层	输入维度	参数			输出维度
层	输入维度	k	s	d	输出维度
dilated conv2d_1（×2）	2×T×161	（3，3）	（1，2）	1，5	16×T×80
dilated conv2d_2（×2）	32×T×80	（3，3）	（1，2）	2，2	32×T×39
dilated conv2d_3（×2）	64×T×39	（3，3）	（1，2）	5，1	32×T×19
dilated conv2d_4（×2）	64×T×19	（3，3）	（1，2）	1，5	64×T×9
dilated conv2d_5（×2）	128×T×9	（3，3）	（1，2）	2，2	64×T×4
dilated conv2d_6（×2）	128×T×4	（3，3）	（1，2）	5，1	128×T×4
GTCM	256×T×4	（1，1），（3，3）	（1，1），（1，2）	1	256×T×4
	256×T×4	（3，1）	（1，1）	1	256×T×4
		（3，1）	（1，1）	1
		（3，1）	（1，1）	2
		（3，1）	（1，1）	2
		（3，1）	（1，1）	4
		（3，1）	（1，1）	4
		（3，1）	（1，1）	8
		（3，1）	（1，1）	8
	256×T×4	（1，1），（3，3）	（1，1），（1，2）	1	256×T×4
deconv2d_glu_6（×2）	512×T×4	（3，1）	（1，1）	1	128×T×4
deconv2d_glu_5（×2）	256×T×4	（3，1）	（1，2）	1	128×T×9
deconv2d_glu_4（×2）	256×T×9	（3，1）	（1，2）	1	64×T×19
deconv2d_glu_3（×2）	128×T×9	（3，1）	（1，2）	1	64×T×39
deconv2d_glu_2（×2）	128×T×39	（3，1）	（1，2）	1	32×T×80
deconv2d_glu_1（×2）	64×T×80	（3，1）	（1，2）	1	1×T×161
linear（×2）	1×T×161	—	—	—	1×T×161

数据集	噪声	test SNR/dB	Noisy		CRN		GCRN		TCNN		DARCN		GDCRN
数据集	噪声	test SNR/dB	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%
WSJ0	babble	-5	1.059 4	60.62	1.131 6	64.94	1.139 6	66.08	1.134 2	65.43	1.146 2	64.88	1.167 4	65.58
		0	1.084 4	71.08	1.333 2	77.55	1.341 0	79.53	1.368 1	78.93	1.336 2	78.78	1.401 3	80.18
		5	1.164 6	80.47	1.639 1	85.77	1.625 5	86.66	1.586 1	85.58	1.659 9	87.07	1.723 0	86.96
	factory2	-5	1.041 4	58.92	1.115 9	61.81	1.124 9	62.57	1.112 5	63.50	1.126 9	63.97	1.137 9	66.06
		0	1.051 5	69.11	1.232 0	73.50	1.250 9	76.86	1.249 8	75.56	1.239 3	76.83	1.326 1	78.92
		5	1.096 3	78.65	1.365 2	82.39	1.478 3	84.41	1.487 7	84.05	1.512 2	84.73	1.579 8	85.80
TIMIT	babble	-5	1.059 4	60.62	1.095 0	61.27	1.124 8	60.83	1.063 4	61.10	1.120 2	60.87	1.160 5	61.36
		0	1.084 4	71.08	1.169 9	71.87	1.259 9	72.54	1.164 8	71.13	1.171 3	72.38	1.319 3	73.36
		5	1.164 6	80.47	1.288 5	79.27	1.479 9	80.52	1.294 7	78.48	1.418 8	80.93	1.553 6	81.15
	factory2	-5	1.041 4	58.92	1.070 8	61.27	1.090 7	62.00	1.087 3	61.74	1.078 1	61.87	1.125 4	61.75
		0	1.051 5	69.11	1.137 4	71.87	1.183 0	72.97	1.140 7	71.96	1.212 4	72.94	1.252 3	73.23
		5	1.096 3	78.65	1.239 9	79.27	1.309 0	79.64	1.290 2	79.68	1.377 8	80.12	1.423 4	80.49
VoiceBank	babble	-5	1.059 4	60.62	1.067 4	62.48	1.122 1	64.36	1.102 0	63.72	1.113 8	63.81	1.144 4	66.83
		0	1.084 4	71.08	1.155 0	75.50	1.295 1	79.03	1.183 9	77.56	1.277 0	75.05	1.336 8	79.47
		5	1.164 6	80.47	1.365 8	84.35	1.554 2	86.08	1.417 1	81.68	1.459 9	86.56	1.618 8	86.35
	factory2	-5	1.041 4	58.92	1.056 8	62.16	1.101 2	63.91	1.080 8	63.61	1.078 1	64.21	1.108 2	66.40
		0	1.051 5	69.11	1.133 5	73.10	1.244 6	77.45	1.181 3	74.12	1.177 5	75.35	1.240 8	78.39
		5	1.096 3	78.65	1.283 5	81.55	1.417 3	84.18	1.294 8	83.70	1.375 8	84.28	1.422 9	85.13

数据集	噪声	test SNR/dB	Noisy		CRN		GCRN		TCNN		DARCN		GDCRN
数据集	噪声	test SNR/dB	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%
WSJ0	babble	-5	1.059 4	60.62	1.131 6	64.94	1.139 6	66.08	1.134 2	65.43	1.146 2	64.88	1.167 4	65.58
		0	1.084 4	71.08	1.333 2	77.55	1.341 0	79.53	1.368 1	78.93	1.336 2	78.78	1.401 3	80.18
		5	1.164 6	80.47	1.639 1	85.77	1.625 5	86.66	1.586 1	85.58	1.659 9	87.07	1.723 0	86.96
	factory2	-5	1.041 4	58.92	1.115 9	61.81	1.124 9	62.57	1.112 5	63.50	1.126 9	63.97	1.137 9	66.06
		0	1.051 5	69.11	1.232 0	73.50	1.250 9	76.86	1.249 8	75.56	1.239 3	76.83	1.326 1	78.92
		5	1.096 3	78.65	1.365 2	82.39	1.478 3	84.41	1.487 7	84.05	1.512 2	84.73	1.579 8	85.80
TIMIT	babble	-5	1.059 4	60.62	1.095 0	61.27	1.124 8	60.83	1.063 4	61.10	1.120 2	60.87	1.160 5	61.36
		0	1.084 4	71.08	1.169 9	71.87	1.259 9	72.54	1.164 8	71.13	1.171 3	72.38	1.319 3	73.36
		5	1.164 6	80.47	1.288 5	79.27	1.479 9	80.52	1.294 7	78.48	1.418 8	80.93	1.553 6	81.15
	factory2	-5	1.041 4	58.92	1.070 8	61.27	1.090 7	62.00	1.087 3	61.74	1.078 1	61.87	1.125 4	61.75
		0	1.051 5	69.11	1.137 4	71.87	1.183 0	72.97	1.140 7	71.96	1.212 4	72.94	1.252 3	73.23
		5	1.096 3	78.65	1.239 9	79.27	1.309 0	79.64	1.290 2	79.68	1.377 8	80.12	1.423 4	80.49
VoiceBank	babble	-5	1.059 4	60.62	1.067 4	62.48	1.122 1	64.36	1.102 0	63.72	1.113 8	63.81	1.144 4	66.83
		0	1.084 4	71.08	1.155 0	75.50	1.295 1	79.03	1.183 9	77.56	1.277 0	75.05	1.336 8	79.47
		5	1.164 6	80.47	1.365 8	84.35	1.554 2	86.08	1.417 1	81.68	1.459 9	86.56	1.618 8	86.35
	factory2	-5	1.041 4	58.92	1.056 8	62.16	1.101 2	63.91	1.080 8	63.61	1.078 1	64.21	1.108 2	66.40
		0	1.051 5	69.11	1.133 5	73.10	1.244 6	77.45	1.181 3	74.12	1.177 5	75.35	1.240 8	78.39
		5	1.096 3	78.65	1.283 5	81.55	1.417 3	84.18	1.294 8	83.70	1.375 8	84.28	1.422 9	85.13

噪声	test SNR/dB	Noisy		SENet		CBAM		CTFA-a		CTFA-b
噪声	test SNR/dB	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%	PESQ	STOI/%
babble	-5	1.059 4	60.62	1.123 2	62.84	1.157 3	65.13	1.148 1	64.77	1.167 4	65.58
	0	1.084 4	71.08	1.293 1	77.25	1.399 6	80.10	1.382	79.40	1.401 3	80.18
	5	1.164 6	80.47	1.606 7	85.80	1.717 5	87.16	1.695 8	86.95	1.723 0	86.96
factory2	-5	1.041 4	58.92	1.122 8	65.40	1.137 0	65.69	1.122 7	63.98	1.137 9	66.06
	0	1.051 5	69.11	1.272 6	77.97	1.294 6	78.79	1.280 2	77.61	1.326 1	78.92
	5	1.096 3	78.65	1.466 8	85.01	1.492 9	85.36	1.502 1	84.79	1.579 8	85.80

Monaural speech enhancement based on gated dilated convolutional recurrent network

基于门控膨胀卷积循环网络的单声道语音增强

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 37

Related Articles 15

Recommended Articles

Metrics

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[4]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[5]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[6]	Zhonghua LI, Yunqi BAI, Xuejin WANG, Leilei HUANG, Chujun LIN, Shiyu LIAO. Low illumination face detection based on image enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2588-2594.
[7]	Shangbin MO, Wenjun WANG, Ling DONG, Shengxiang GAO, Zhengtao YU. Single-channel speech enhancement based on multi-channel information aggregation and collaborative decoding [J]. Journal of Computer Applications, 2024, 44(8): 2611-2617.
[8]	Li LIU, Haijin HOU, Anhong WANG, Tao ZHANG. Generative data hiding algorithm based on multi-scale attention [J]. Journal of Computer Applications, 2024, 44(7): 2102-2109.
[9]	Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199.
[10]	Dahai LI, Zhonghua WANG, Zhendong WANG. Dual-branch low-light image enhancement network combining spatial and frequency domain information [J]. Journal of Computer Applications, 2024, 44(7): 2175-2182.
[11]	Wenliang WEI, Yangping WANG, Biao YUE, Anzheng WANG, Zhe ZHANG. Deep learning model for infrared and visible image fusion based on illumination weight allocation and attention [J]. Journal of Computer Applications, 2024, 44(7): 2183-2191.
[12]	Wu XIONG, Congjun CAO, Xuefang SONG, Yunlong SHAO, Xusheng WANG. Handwriting identification method based on multi-scale mixed domain attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2225-2232.
[13]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[14]	Dianhui MAO, Xuebo LI, Junling LIU, Denghui ZHANG, Wenjing YAN. Chinese entity and relation extraction model based on parallel heterogeneous graph and sequential attention mechanism [J]. Journal of Computer Applications, 2024, 44(7): 2018-2025.
[15]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.

网络	参数量/10⁶	网络	参数量/10⁶
CRN	17.58	GDCRN	4.52
GCRN	9.77	DARCN	1.23
TCNN	5.10

网络	参数量/10⁶	网络	参数量/10⁶
CRN	17.58	GDCRN	4.52
GCRN	9.77	DARCN	1.23
TCNN	5.10