《计算机应用》唯一官方网站 ›› 2025, Vol. 45 ›› Issue (10): 3284-3293.DOI: 10.11772/j.issn.1001-9081.2024101463

• 多媒体计算与计算机仿真 • 上一篇    

基于多视角注意力的异构双分支解码单通道语音增强

更藏措毛null1,2, 黄鹤鸣1,2()   

  1. 1.青海师范大学 计算机学院,西宁 810000
    2.藏语智能全国重点实验室(青海师范大学),西宁 810008
  • 收稿日期:2024-10-21 修回日期:2025-03-02 接受日期:2025-03-10 发布日期:2025-10-14 出版日期:2025-10-10
  • 通讯作者: 黄鹤鸣
  • 作者简介:更藏措毛(1993—),女(藏族),青海共和县人,讲师,博士研究生,CCF会员,主要研究方向:语音增强、语音识别
    黄鹤鸣(1969—),男(藏族),青海海东人,教授,博士,CCF会员,主要研究方向:模式识别、智能系统。Email:1021489068@qq.com
  • 基金资助:
    国家自然科学基金资助项目(62066039);青海省自然科学基金资助项目(2022-ZJ-925)

Monaural speech enhancement with heterogeneous dual-branch decoding based on multi-view attention

Gengzangcuomao1,2, Heming HUANG1,2()   

  1. 1.College of Computer,Qinghai Normal University,Xining Qinghai 810008,China
    2.The State Key Laboratory of Tibetan Intelligence (Qinghai Normal University),Xining Qinghai 810008,China
  • Received:2024-10-21 Revised:2025-03-02 Accepted:2025-03-10 Online:2025-10-14 Published:2025-10-10
  • Contact: Heming HUANG
  • About author:Gengzangcuomao, born in 1993, Ph. D. candidate, lecturer. Her research interests include speech enhancement, speech recognition.
    HUANG Heming, born in 1969, Ph. D., professor. His research interests include pattern recognition, intelligent system.
  • Supported by:
    National Natural Science Foundation of China(62066039);Natural Science Foundation of Qinghai Province(2022-ZJ-925)

摘要:

针对单通道语音增强中主流编解码结构面临的声学特征提取不充分、通道信息丢失和幅度相位补偿困难等问题,提出一种融合不同维度语音特征的异构双分支解码单通道语音增强模型——HDBMV(Heterogeneous Dual-Branch with Multi-View)。该模型通过信息融合编码器(IFE)、时频残差Conformer(TFRC)模块、多视角注意力(MVA)模块和异构双分支解码器(HDBD)等机制,提升单通道语音增强的性能。首先,IFE联合处理振幅与复数特征,捕捉全局依赖和局部相关,生成紧凑的特征表示;其次,TFRC模块有效捕捉时间维度和频域维度上的相关性,同时降低计算复杂度;再次,MVA模块重构通道域和时频域信息,进一步增强模型对信息的多视角多层次的表征能力;最后,HDBD分别处理幅度特征和细化复数特征,解决幅度相位补偿问题,提升解码鲁棒性。实验结果表明,HDBMV在公开数据集VoiceBank+DEMAND、大数据集DNS Challenge 2020和自建的藏语数据集BodSpeDB上的语音质量感知评估(PESQ)分别达到了3.00、3.12和2.09,短时目标可理解度(STOI)分别达到了0.96、0.97和0.81。可见,HDBMV以最小的参数量和较高的计算效率获得了最佳的语音增强性能和较强的泛化能力。

关键词: 语音增强, 编解码器, Conformer, 注意力机制, 复数特征

Abstract:

To address the issues of insufficient acoustic feature extraction, channel information loss, and amplitude-phase compensation difficulties in mainstream encoder-decoder structures, a monaural speech enhancement model of heterogeneous dual-branch decoding for monaural speech enhancement — HDBMV (Heterogeneous Dual-Branch with Multi-View) was proposed by combining speech features from different dimensions. In the model, the performance of monaural speech enhancement was improved through mechanisms such as an Information Fusion Encoder (IFE), Time-Frequency Residual Conformer (TFRC) module, Multi-View Attention (MVA) module, and Heterogeneous Dual-Branch Decoder (HDBD). Firstly, amplitude and multiple features were processed by IFE jointly, thereby capturing both global dependencies and local correlations to generate compact feature representations. Secondly, TFRC module was used to capture correlations along both time and frequency dimensions effectively, while reducing the computational complexity. Thirdly, MVA module was used to reconstruct information across both channel and time-frequency domains, thereby further enhancing ability of the model to represent information in multiple views and levels. Finally, HDBD was used to process amplitude features and refine multiple features separately, thereby solving the amplitude-phase compensation problem and improving the decoding robustness. Experimental results show that HDBMV achieves the Perceptual Evaluation of Speech Quality (PESQ) of 3.00, 3.12, and 2.09, respectively, and the Short-Time Objective Intelligibility measure (STOI) of 0.96, 0.97, and 0.81, respectively, on the public dataset VoiceBank+DEMAND, the large-scale dataset DNS Challenge 2020, and the self-built Tibetan dataset BodSpeDB. It can be seen that with the smallest number of parameters and high computational efficiency, HDBMV obtains the best speech enhancement performance and strong generalization ability.

Key words: speech enhancement, encoder-decoder, Conformer, attention mechanism, complex feature

中图分类号: