计算机应用 ›› 2016, Vol. 36 ›› Issue (11): 3222-3228.DOI: 10.11772/j.issn.1001-9081.2016.11.3222

• 虚拟现实与数字媒体 • 上一篇    下一篇

基于时频二维能量特征的汉语音节切分方法

张扬, 赵晓群, 王缔罡   

  1. 同济大学 电子与信息工程学院, 上海 201804
  • 收稿日期:2016-05-27 修回日期:2016-06-20 出版日期:2016-11-10 发布日期:2016-11-12
  • 通讯作者: 赵晓群
  • 作者简介:张扬(1989-),男,山东滨州人,博士研究生,主要研究方向:语音信号处理;赵晓群(1962-),男,黑龙江依安人,教授,博士,主要研究方向:语音信号处理、编码理论;王缔罡(1988-),男,上海人,博士研究生,主要研究方向:编码理论。

Chinese speech segmentation into syllables based on energies in different times and frequencies

ZHANG Yang, ZHAO Xiaoqun, WANG Digang   

  1. College of electronics and information engineering, Tongji University, Shanghai 201804, China
  • Received:2016-05-27 Revised:2016-06-20 Online:2016-11-10 Published:2016-11-12

摘要: 较准确的语音切分方法可以极大提高语料标注等工作的效率,有助于语音识别等应用中语音与模型的对齐。利用汉语语音在时频二维的能量特征设计了一种新的汉语语音音节切分方法。用传统方法判断静音帧,用相同时间不同频率的二维能量判断清音帧,用不同时间特定频段的0-1二维能量判断浊音帧及有话帧,综合4种判断结果给出音节切分位置。实验结果表明,该方法切分准确度优于基于归并的音节切分自动机(MBSDA)和高斯拟合法,其音节切分误差为0.0297 s,音节切分偏差率为7.93%。

关键词: 音节切分, 时频二维, 短时能量, 切分偏差率

Abstract: Precise speech segmentation methods, which can also greatly improve the efficiency of corpus annotation works, are helpful in comparing voice with voice models in speech recognition. A new Chinese speech segmentation into syllables based on the feature of time-frequency-dimensional energy was proposed:firstly, silence frames were searched in traditional way; secondly, unvoiced frames were sought using the difference of energies in different frequencies; thirdly, the voiced frames and speech frames were looked for with the help of 0-1 energies in special frequency ranges; finally, syllable positions were given depending on the judgements above. The experimental results show that the proposed method whose syllable error is 0.0297 s and syllable deviation is 7.93% is superior to Merging-Based Syllable Detection Automaton (MBSDA) and method of Gauss fitting.

Key words: speech segmentation into syllables, time-frequency-dimensional, short-time energy, segmentation deviation rate

中图分类号: