计算机应用 ›› 2010, Vol. 30 ›› Issue (1): 282-284.

• 典型应用 • 上一篇    

可训练文语转换系统的时长模型优化

吕浩音   

  1. 甘肃省陇东学院
  • 收稿日期:2009-07-21 修回日期:2009-08-06 发布日期:2010-01-01 出版日期:2010-01-01
  • 通讯作者: 吕浩音

Duration model optimization in HMM-based TTS

  • Received:2009-07-21 Revised:2009-08-06 Online:2010-01-01 Published:2010-01-01

摘要: 文语转换是人机交互的一项关键技术。当前的基于隐马尔可夫模型的语音合成系统已经能够合成出较高自然度和可懂度的声音,但与自然语音相比,韵律的节奏感不强,其主要原因是受时长的影响。提出在生成状态时长时同时优化状态、音子和音节三层模型的似然值,并通过考虑状态和长时时长的信息,使在重估计的过程中减少状态时长的错误。在普通话语料库上的实验证明,优化后的时长模型能够产生更加准确的状态时长,与状态级的基线系统相比较,均方根误差由19.90提高到了17.45。主观评测也显示改进后的模型优于基线模型。

关键词: 隐马尔可夫模型, 音节时长, 高斯分布, 最大似然值

Abstract: Text-To-Speech (TTS) is one of the important technologies of humancomputer interaction. The current stateofart HMM based TTS can produce highly intelligible and natural output speech and deliver a decent segmental quality. However, its duration tends to be unnatural. In this paper, the state durations were generated by jointly maximizing the duration likelihoods of state, phone and syllable units. By considering the duration of state and longer units jointly, the accumulation of errors in estimated state durations was regulated in the optimization procedure. Experiments on Mandarin databases show that the optimized model yields more accurate duration predictions, compared with the baseline state duration model. The improvement of phone RMSE is 2.45ms. The perceptual test further confirms that the optimized duration model outperforms the baseline system.

Key words: Hidden Markov Model (HMM), syllable duration, Gaussian distribution, maximum likelihood value