计算机应用 ›› 2016, Vol. 36 ›› Issue (5): 1410-1414.DOI: 10.11772/j.issn.1001-9081.2016.05.1410

• 虚拟现实与数字媒体 • 上一篇    下一篇

基于音节时间长度高斯拟合的汉语音节切分方法

张扬, 赵晓群, 王缔罡   

  1. 同济大学 电子与信息工程学院, 上海 201804
  • 收稿日期:2015-11-13 修回日期:2016-01-10 出版日期:2016-05-10 发布日期:2016-05-09
  • 通讯作者: 赵晓群
  • 作者简介:张扬(1989-),男,山东滨州人,博士研究生,主要研究方向:语音信号处理;赵晓群(1962-),男,黑龙江依安人,教授,博士,主要研究方向:语音信号处理、编码理论;王缔罡(1988-),男,上海人,博士研究生,主要研究方向:编码理论。

Chinese speech segmentation method based on Gauss distribution of time spans of syllables

ZHANG Yang, ZHAO Xiaoqun, WANG Digang   

  1. College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
  • Received:2015-11-13 Revised:2016-01-10 Online:2016-05-10 Published:2016-05-09

摘要: 研究汉语自然语音音节切分方法具有明显现实意义,比较准确的自然语音切分方法可以代替人工对一些拥有参照文本的语音进行标注。然而至今为止并没有完全准确的汉语语音音节切分方法。依据相同发音环境下汉语语音音节时间长度服从某种高斯分布和相邻语音音节之间存在短时能量波谷两个假设,提出了基于音节时间长度高斯拟合的汉语音节切分方法。对算法进行分析,根据初步切分短时能量波谷分散到各分语音段的特性,提出了简化算法,有效降低了该音节切分方法的时间复杂度。实验结果表明,音节切分准确度(与人工标注切分时间距离平方的均值)达到小数点后3位,在台式机Matlab环境下运算时间均不超过1 s,可以达到应用要求。

关键词: 汉语, 自然语音, 音节切分, 时间长度, 波谷, 高斯分布

Abstract: So far away, there is no accurate method for Chinese natural speech segmentation of syllables,which is meaningful in labeling speech with reference text instead of people. According to two hypotheses that time spans of Chinese syllables under the same pronunciation obey Gauss distribution and short-time energy valley exists between two adjacent syllables, Chinese speech segmentation method based on Gauss distribution of time spans of syllables was proposed. A simplified method based on distribution of energy valleys was given, which effectively reduced the time complexity of this speech segmentation method. The experimental results show that segmentation accuracy (mean square value of time spans between artificial labels and labels created by this method) achieve 10-3 and computing times are less than 1 s in Matlab of PC.

Key words: Chinese, natural speech, speech segmentation, time span, valley, Gauss distribution

中图分类号: