Double complex convolutional and attention aggregating recurrent network for speech enhancement

doi:10.11772/j.issn.1001-9081. 2022101533

Journal of Computer Applications

Received:2022-10-13 Revised:2022-12-25 Accepted:2022-12-28 Online:2023-04-12 Published:2023-04-12
Contact: ZHAN Yong-zhao
Supported by:
National Natural Science Foundation of China;Jiangsu Province Key Research and Development Program

面向语音增强的双复数卷积注意聚合递归网络

余本年¹,詹永照¹,毛启容^1,2,董文龙¹,刘洪麟¹

1.江苏大学计算机科学与通信工程学院，江苏镇江 212013
2.江苏省大数据泛在感知与智能农业应用工程研究中心，江苏镇江 212013

通讯作者: 詹永照
基金资助:
国家自然科学基金重点项目;江苏省重点研究开发计划

Abstract

Abstract: Aiming at the problems of limited representation of graph feature correlation information and unsatisfactory denoising effect in existing speech enhancement methods, a Double Complex Convolutional Attention Aggregation Recurrent Network (DCCARN) was proposed. First, a double-complex convolutional network was established to encode the two-branch information of the spectrogram features after the short-time Fourier transform. Secondly, the encoders in the two branches were respectively used for different feature-block and intra-block attention mechanisms, and speech feature information was re-labeled. Then, the long-term sequence information was processed by Long-Short-Term-Memory (LSTM), and the spectral features were restored and aggregated by two decoders. Finally, the estimated speech waveform was generated by short-time inverse Fourier transform to activate the purpose of suppressing noise. Experiments are carried out on the public dataset Voice Bank + DMAND (VBD) and the noised the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus（TIMIT）dataset. The results show that, compared with the phase-aware Deep Complex Convolutional Recurrent Network (DCCRN), DCCARN is more effective in Perceptual Evaluation of Speech Quality (PESQ) increased by 5.597% and 2.672% respectively. It is verified that the proposed method can more accurately capture the correlation information on the speech features, suppress noise more effectively and enhance speech intelligibility.

Key words: speech enhancement, attention mechanism, complex convolutional network, coding, LSTM(Long Short Term Memory)

摘要： 针对现有语音增强方法对图谱特征关联信息表达有限和去噪效果不理想的问题，提出一种双复数卷积注意聚合递归网络（DCCARN）。首先，建立双复数卷积网络，对短时傅里叶变换后的语谱图特征分别进行两分支信息编码；其次，将两分支中编码分别用特征块间和块内注意力机制对不同的语音特征信息进行重标注；然后，经长短期记忆（LSTM）处理长时间序列信息，再经两解码器还原语谱特征并将特征聚合；最后，经短时逆傅里叶变换生成目标语音波形，达到抑制噪声目的。在公开数据集VBD和加噪的TIMIT数据集上分别进行实验，结果表明，与相位感知的深度复数卷积递归网路（DCCRN）相比，DCCARN在客观语音质量评估（PESQ）上分别提升了5.597%和2.672%。验证了所提方法能更准确地捕获图谱特征上的关联信息并更有效抑制噪声和增强语音清晰度。

关键词: 语音增强, 注意力机制, 复数卷积网络, 编码, 长短期记忆

CLC Number:

TN912.34

余本年詹永照毛启容董文龙刘洪麟. 面向语音增强的双复数卷积注意聚合递归网络[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081. 2022101533.

[1]	Yilin DENG, Fajiang YU. Pseudo random number generator based on LSTM and separable self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(9): 2893-2901.
[2]	Weigang LI, Jiale SHAO, Zhiqiang TIAN. Point cloud classification and segmentation network based on dual attention mechanism and multi-scale fusion [J]. Journal of Computer Applications, 2025, 45(9): 3003-3010.
[3]	Xiang WANG, Zhixiang CHEN, Guojun MAO. Multivariate time series prediction method combining local and global correlation [J]. Journal of Computer Applications, 2025, 45(9): 2806-2816.
[4]	Dengran REN, Shuying WANG. Nested named entity recognition model for wind power equipment based on differential boundary enhancement [J]. Journal of Computer Applications, 2025, 45(9): 2798-2805.
[5]	Jinggang LYU, Shaorui PENG, Shuo GAO, Jin ZHOU. Speech enhancement network driven by complex frequency attention and multi-scale frequency enhancement [J]. Journal of Computer Applications, 2025, 45(9): 2957-2965.
[6]	Jiantao JIANG, Baoyan SONG, Xiaohuan SHAN. Diversity semantic query on resource description framework graphs based on multi-level neighborhood predicate label tree encoding index [J]. Journal of Computer Applications, 2025, 45(8): 2464-2469.
[7]	Haifeng WU, Liqing TAO, Yusheng CHENG. Partial label regression algorithm integrating feature attention and residual connection [J]. Journal of Computer Applications, 2025, 45(8): 2530-2536.
[8]	Ao SHEN, Ruizhang HUANG, Jingjing XUE, Yanping CHEN, Yongbin QIN. Deep variational text clustering model based on distribution augmentation [J]. Journal of Computer Applications, 2025, 45(8): 2457-2463.
[9]	Jin ZHOU, Yuzhi LI, Xu ZHANG, Shuo GAO, Li ZHANG, Jiachuan SHENG. Modulation recognition network for complex electromagnetic environments [J]. Journal of Computer Applications, 2025, 45(8): 2672-2682.
[10]	Chao JING, Yutao QUAN, Yan CHEN. Improved multi-layer perceptron and attention model-based power consumption prediction algorithm [J]. Journal of Computer Applications, 2025, 45(8): 2646-2655.
[11]	Jinhao LIN, Chuan LUO, Tianrui LI, Hongmei CHEN. Thoracic disease classification method based on cross-scale attention network [J]. Journal of Computer Applications, 2025, 45(8): 2712-2719.
[12]	Chen LIANG, Yisen WANG, Qiang WEI, Jiang DU. Source code vulnerability detection method based on Transformer-GCN [J]. Journal of Computer Applications, 2025, 45(7): 2296-2303.
[13]	Yihan WANG, Chong LU, Zhongyuan CHEN. Multimodal sentiment analysis model with cross-modal text information enhancement [J]. Journal of Computer Applications, 2025, 45(7): 2237-2244.
[14]	Haoyu LIU, Pengwei KONG, Yaoli WANG, Qing CHANG. Pedestrian detection algorithm based on multi-view information [J]. Journal of Computer Applications, 2025, 45(7): 2325-2332.
[15]	Xiaoqiang ZHAO, Yongyong LIU, Yongyong HUI, Kai LIU. Batch process quality prediction model using improved time-domain convolutional network with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2025, 45(7): 2245-2252.