Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (5): 1596-1603.DOI: 10.11772/j.issn.1001-9081.2025050674

• Multimedia computing and computer simulation • Previous Articles    

Bispectrum-based nonlinear feature coupling method for speech enhancement

Zhengtao YU1,2(), Yixue LUAN1,2, Wenjun WANG1,2, Ling DONG1,2, Yan XIANG1,2, Shengxiang GAO1,2   

  1. 1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming Yunnan 650504,China
    2.Key Laboratory of Artificial Intelligence in Yunnan Province (Kunming University of Science and Technology),Kunming Yunnan 650504,China
  • Received:2025-06-19 Revised:2025-07-18 Accepted:2025-07-23 Online:2025-08-01 Published:2026-05-10
  • Contact: Zhengtao YU
  • About author:LUAN Yixue, born in 2000, M. S. candidate. Her research interests include speech enhancement, speech recognition.
    WANG Wenjun, born in 1988, Ph. D. candidate. His research interests include speech recognition, natural language processing.
    DONG Ling, born in 1984, Ph. D. candidate, lecturer. His research interests include speech recognition, natural language processing.
    XIANG Yan, born in 1979, Ph. D., associate professor. Her research interests include natural language processing.
    GAO Shengxiang, born in 1977, Ph. D., professor. Her research interests include natural language processing, machine translation, speech recognition, speech synthesis.
  • Supported by:
    National Natural Science Foundation of China(U24A20334);Key Research and Development Program of Yunnan Province(202303AP140008);Open Fund of Key Laboratory of Artificial Intelligence in Yunnan Province(CB24069D018A)

基于双谱非线性特征耦合的语音增强方法

余正涛1,2(), 栾逸雪1,2, 王文君1,2, 董凌1,2, 相艳1,2, 高盛祥1,2   

  1. 1.昆明理工大学 信息工程与自动化学院,昆明 650504
    2.云南省人工智能重点实验室(昆明理工大学),昆明 650504
  • 通讯作者: 余正涛
  • 作者简介:栾逸雪(2000—),女,云南个旧人,硕士研究生,主要研究方向:语音增强、语音识别;
    王文君(1988—),男,云南昆明人,博士研究生,主要研究方向:语音识别、自然语言处理;
    董凌(1984—),男,云南大理人,讲师,博士研究生,主要研究方向:语音识别、自然语言处理;
    相艳(1979—),女,云南大理人,副教授,博士,主要研究方向:自然语言处理;
    高盛祥(1977—),女,云南洱源人,教授,博士,CCF会员,主要研究方向:自然语言处理、机器翻译、语音识别、语音合成。
  • 基金资助:
    国家自然科学基金资助项目(U24A20334);国家自然科学基金资助项目(62466030);国家自然科学基金资助项目(62376111);云南省重点研发计划项目(202303AP140008);云南省人工智能重点实验室开放基金资助项目(CB24069D018A)

Abstract:

To address the issue that current time-frequency domain-based speech enhancement methods commonly model the linear characteristics of signals using second-order spectral statistics after Short-Time Fourier Transform (STFT), while neglecting the potential higher-order nonlinear interaction information in speech, a Bispectrum-based Nonlinear Feature Coupling method for speech enhancement (BNFC) was proposed. An encoder-decoder structure was employed as the overall framework, and a bispectral feature extraction module was introduced after the encoder to capture phase coupling and nonlinear structural information revealed by third-order statistics. By fusing the extracted bispectral features with encoder features through skip connections, deeper amplitude and phase modeling was achieved. Experimental results on the VoiceBank+DEMAND dataset showed that BNFC achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.57, representing a 15.53% improvement over the baseline model BREM (Bispectral Refinement Enhancement Module). In addition, Mean Opinion Score of Signal Distortion (CSIG), Background Noise Intrusiveness (CBAK), and Overall Speech Quality (COVL) were improved by 5.51%, 3.08%, and 10.31%, respectively, validating the importance of higher-order nonlinear feature modeling for speech enhancement tasks.

Key words: Speech Enhancement (SE), bispectral analysis, feature coupling, higher-order nonlinearity, skip connection

摘要:

针对当前基于时频域的语音增强方法普遍通过短时傅里叶变换(STFT)后利用频谱二阶统计量建模信号的线性特征,忽略了语音中潜在的高阶非线性交互信息的问题,提出一种基于双谱非线性特征耦合的语音增强方法(BNFC)。该方法采用编解码结构作为整体框架,在编码器后引入双谱特征提取模块,以获取三阶统计量所揭示的相位耦合与非线性结构信息;并通过跳跃连接与编码器特征融合,实现更深层次的幅度与相位建模。在VoiceBank+DEMAND数据集上的实验结果显示,BNFC在语音质量的感知评估(PESQ)指标上达到3.57,比基线模型BREM(Bispectral Refinement Enhancement Module)提升15.53%,在语音信号失真感知评分(CSIG)、背景噪声干扰评分(CBAK)和整体语音质量评分(COVL)指标上分别提升5.51%、3.08%和10.31%,验证了高阶非线性特征建模对语音增强任务的重要性。

关键词: 语音增强, 双谱分析, 特征耦合, 高阶非线性, 跳跃连接

CLC Number: