Journal of Computer Applications

    Next Articles

Bispectrum-Based Nonlinear Feature Coupling for Speech Enhancement

  

  • Received:2025-06-19 Revised:2025-07-18 Accepted:2025-07-23 Online:2025-08-01 Published:2025-08-01
  • Supported by:
    National Natural Science Foundation of China;National Natural Science Foundation of China;National Natural Science Foundation of China;Key R&D Program of Yunnan Province;Open Fund of Yunnan Provincial K ey Laboratory of Artificial Intelligence

基于双谱非线性特征耦合的语音增强方法

余正涛1,栾逸雪1,王文君2,董凌2,相艳3,高盛祥1,1   

  1. 1. 昆明理工大学
    2. 昆明理工大学信息工程与自动化学院
    3. 昆明理工大学 信息工程与自动化学院,昆明650500
  • 通讯作者: 栾逸雪
  • 基金资助:
    国家自然科学基金;国家自然科学基金;国家自然科学基金;云南省重点研发计划;云南省人工智能重点实验室开放基金

Abstract: Most existing time-frequency domain speech enhancement methods rely on short-time Fourier transform (STFT) and use second-order spectral statistics to model the linear characteristics of signals, while neglecting the potential high-order nonlinear interactions in speech. This paper proposes a speech enhancement method based on bispectrum nonlinear feature coupling. The method adopts an encoder-decoder architecture and introduces a bispectrum feature extraction module after the encoder to capture phase coupling and nonlinear structural information revealed by third-order statistics. A skip connection mechanism is used to fuse the bispectral and encoder features, enabling deeper modeling of both amplitude and phase. Experimental results on the VoiceBank+DEMAND dataset show that the proposed method achieves a Perceptual Evaluation of Speech Quality (PESQ) score of 3.57, representing a 15.53% improvement over the baseline. Furthermore, it achieves relative gains of 5.51%, 3.08%, and 10.31% on the Mean Opinion Score for Signal Distortion (CSIG), Background Noise Intrusiveness (CBAK), and Overall Speech Quality (COVL), respectively.

Key words: speech enhancement, bispectral analysis, feature coupling, higher-order nonlinearity, skip connections

摘要: 针对当前基于时频域的语音增强方法普遍通过短时傅里叶变换后利用频谱二阶统计量建模信号的线性特征,而忽略了语音中潜在的高阶非线性交互信息的问题。提出一种基于双谱非线性特征耦合的语音增强方法,该方法采用编解码结构作为整体框架,在编码器后引入双谱特征提取模块,以获取三阶统计量所揭示的相位耦合与非线性结构信息。并通过跳跃连接与编码器特征融合,实现更深层次的幅度与相位建模。在 VoiceBank+DEMAND 数据集上的实验结果显示,此方法在语音质量的感知评估(PESQ)指标上达到 3.57,较基线提升15.53%,在语音信号失真感知评分(CSIG)、背景噪声干扰评分(CBAK)和整体语音质量评分(COVL)指标上分别提升5.51%、3.08% 和10.31%。

关键词: 语音增强, 双谱分析, 特征耦合, 高阶非线性, 跳跃连接

CLC Number: