Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (1): 1-7.DOI: 10.11772/j.issn.1001-9081.2019061071

• Artificial intelligence •     Next Articles

Review of speech segmentation and endpoint detection

YANG Jian, LI Zhenpeng, SU Peng   

  1. School of Mathematics and Computer Science, Dali University, Dali Yunnan 671003, China
  • Received:2019-06-24 Revised:2019-09-04 Online:2020-01-10 Published:2019-10-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71661001), the Planning Project of Philosophy and Social Sciences of Yunnan Province (YB2017072).


杨健, 李振鹏, 苏鹏   

  1. 大理大学 数学与计算机学院, 云南 大理 671003
  • 通讯作者: 杨健
  • 作者简介:杨健(1976-),男,浙江上虞人,副教授,博士,CCF会员,主要研究方向:语音识别、深度神经网络;李振鹏(1976-),男,辽宁沈阳人,副教授,博士,主要研究方向:应用统计学;苏鹏(1975-),男,山东济南人,副教授,博士,主要研究方向:行为规则挖掘。
  • 基金资助:

Abstract: Speech segmentation is an indispensable basic work in speech recognition and speech synthesis, and its quality has a great impact on the following system. Although manual segmentation and labeling is of high accuracy, it is quite time-consuming and laborious, and requires domain experts to deal with. As a result, automatic speech segmentation has become a research hotspot in speech processing. Firstly, aiming at current progress of automatic speech segmentation, several different classification methods of speech segmentation were explained. The alignment-based methods and boundary detection-based methods were introduced respectively, and the neural network speech segmentation methods, which can be applied in the above two frameworks, were expounded in detail. Then, some new speech segmentation technologies based on the methods such as bio-inspiration signal and game theory were introduced, and the performance evaluation metrics widely used in the speech segmentation field were given, and these evaluation metrics were compared and analyzed. Finally, the above contents were summarized and the future important research directions of speech segmentation were put forward.

Key words: speech segmentation, endpoint detection, speech synthesis, signal feature, Artificial Neural Network (ANN)

摘要: 语音分割是语音识别和语音合成中必不可少的基础性工作,其质量对后续系统的影响巨大。使用手工分割和标注虽然精度高,但费时费力,同时需要熟练的领域专家来完成,自动语音分割因此成为语音处理的研究热点。首先针对自动语音分割目前的研究进展,介绍了语音分割的不同分类方法;然后分别介绍了基于对齐的方法和基于边界检测的方法,并详细介绍了可以应用在上述两种框架下的神经网络语音分割方法;接着介绍了基于生物激励信号以及博弈论等方法的新型语音分割技术,并给出了领域内广泛使用的性能评估度量,并对这些评估指标进行比较和分析;最后总结并提出语音分割研究未来发展的重要方向。

关键词: 语音分割, 端点检测, 语音合成, 信号特征, 人工神经网络

CLC Number: