Chinese speech segmentation method based on Gauss distribution of time spans of syllables

doi:10.11772/j.issn.1001-9081.2016.05.1410

Abstract

Abstract: So far away, there is no accurate method for Chinese natural speech segmentation of syllables,which is meaningful in labeling speech with reference text instead of people. According to two hypotheses that time spans of Chinese syllables under the same pronunciation obey Gauss distribution and short-time energy valley exists between two adjacent syllables, Chinese speech segmentation method based on Gauss distribution of time spans of syllables was proposed. A simplified method based on distribution of energy valleys was given, which effectively reduced the time complexity of this speech segmentation method. The experimental results show that segmentation accuracy (mean square value of time spans between artificial labels and labels created by this method) achieve 10^-3 and computing times are less than 1 s in Matlab of PC.

Key words: Chinese, natural speech, speech segmentation, time span, valley, Gauss distribution

摘要： 研究汉语自然语音音节切分方法具有明显现实意义,比较准确的自然语音切分方法可以代替人工对一些拥有参照文本的语音进行标注。然而至今为止并没有完全准确的汉语语音音节切分方法。依据相同发音环境下汉语语音音节时间长度服从某种高斯分布和相邻语音音节之间存在短时能量波谷两个假设,提出了基于音节时间长度高斯拟合的汉语音节切分方法。对算法进行分析,根据初步切分短时能量波谷分散到各分语音段的特性,提出了简化算法,有效降低了该音节切分方法的时间复杂度。实验结果表明,音节切分准确度(与人工标注切分时间距离平方的均值)达到小数点后3位,在台式机Matlab环境下运算时间均不超过1 s,可以达到应用要求。

关键词: 汉语, 自然语音, 音节切分, 时间长度, 波谷, 高斯分布

CLC Number:

TP391.4

ZHANG Yang, ZHAO Xiaoqun, WANG Digang. Chinese speech segmentation method based on Gauss distribution of time spans of syllables[J]. Journal of Computer Applications, 2016, 36(5): 1410-1414.

张扬, 赵晓群, 王缔罡. 基于音节时间长度高斯拟合的汉语音节切分方法[J]. 计算机应用, 2016, 36(5): 1410-1414.

References

[1] TOLEDANO D T, GOMEZ L A H, GRANDE L V. Automatic phonetic segmentation[J]. IEEE Transactions on Speech and Audio Processing, 2003, 11(6):617-625.
[2] WU Y J, KAWAI H, NI J, et al. Discriminative training and explicit duration modeling for HMM-based automatic segmentation[J]. Speech Communication, 2005, 47(3):397-410.
[3] van HEMERT J P. Automatic segmentation of speech[J]. IEEE Transactions on Signal Processing, 1991, 39(4):1008-1012.
[4] CHOU F C, TSENG C Y, LEE L S. A set of corpus-based text-to-speech synthesis technologies for Mandarin Chinese[J]. IEEE Transactions on Speech and Audio Processing, 2002, 10(7):481-494.
[5] 杜守栓. 方言口音普通话语音自动切分算法研究[D].北京:中国科学院, 2006:15-26.(DU S S. Research on robust automatic segmentation of dialectal speech[D]. Beijing:University of Chinese Academy of Sciences, 2006:15-26.)
[6] 何可嘉. 广播语言的自动标注系统[D].北京:北京邮电大学, 2010:22-47.(HE K J. An automatic labeling system for broadcast news[D]. Beijing:Beijing University of Posts and Telecommunications, 2010:22-47.)
[7] 韩虎. 汉语连续语音的音节自动标注算法研究及实现[D].哈尔滨:哈尔滨工业大学, 2008:21-44.(HAN H. Research and realization of the automatic syllable marking algorithm for Chinese continuous speech[D]. Harbin:Harbin Institute of Technology, 2008:21-44.)
[8] LEE K S. MLP-based phone boundary refining for a TTS database[J]. IEEE Transactions on Audio, Speech and Language Processing, 2006, 14(3):981-989.
[9] BROGNAUX S, DRUGMAN T. HMM-based speech segmentation:improvements of fully automatic approaches[J]. IEEE Transactions on Audio, Speech and Language Processing, 2016, 24(1):5-15
[10] 廖文辉, 刘炎.数据分析与SAS实验[M].北京:经济科学出版社, 2010:13-32.(LIAO W H, LIU Y. Data Analysis and SAS Experiment[M]. Beijing:Economic Science Press, 2010:13-32.)
[11] 宋知用.Matlab在语音信号分析与合成中的应用[M].北京:北京航空航天大学出版社, 2013:117-129.(SONG Z Y. Application of Matlab in Speech Signal Analysis and Synthesis[M].Beijing:Beihang University Press, 2013:117-129.)
[12] 章森, 刘磊, 刁麓弘.大规模语音语料库及其在TTS中应用的几个问题[J].计算机学报, 2010, 33(4):667-696.(ZHANG S, LIU L, DIAO L H. Problems on large-scale speech corpus and the applications in TTS[J]. Chinese Journal of Computers, 2010, 33(4):667-696.)BackgroundZHANG Yang, born in 1989, Ph. D. candidate. His research interests include speech signal processing.ZHAO Xiaoqun, born in 1962, Ph. D., professor. His research interests include speech signal processing, coding theory.WANG Digang, born in 1988, Ph. D. candidate. His research interests include coding theory.

[1]	ZHAO Hong, KONG Dongyi. Chinese description of image content based on fusion of image feature attention and adaptive attention [J]. Journal of Computer Applications, 2021, 41(9): 2496-2503.
[2]	QING Xinyi, CHEN Yuling, ZHOU Zhengqiang, TU Yuanchao, LI Tao. Blockchain storage expansion model based on Chinese remainder theorem [J]. Journal of Computer Applications, 2021, 41(7): 1977-1982.
[3]	WU Guoliang, XU Jining. Chinese emergency event extraction method based on named entity recognition task feedback enhancement [J]. Journal of Computer Applications, 2021, 41(7): 1891-1896.
[4]	YANG Longhai, WANG Xueyuan, JIANG Hesong. Blockchain digital signature scheme with improved SM2 signature method [J]. Journal of Computer Applications, 2021, 41(7): 1983-1988.
[5]	JIA Chengxun, LAI Hua, YU Zhengtao, WEN Yonghua, YU Zhiqiang. Chinese-Vietnamese pseudo-parallel corpus generation based on monolingual language model [J]. Journal of Computer Applications, 2021, 41(6): 1652-1658.
[6]	Jiaqi ZHANG, Yueqin ZHANG, Jian CHEN. Pulse condition recognition method based on optimized reinforcement learning path feature classification [J]. Journal of Computer Applications, 2021, 41(11): 3402-3408.
[7]	YUAN Jingling, DING Yuanyuan, PAN Donghang, LI Lin. Chinese implicit sentiment classification model based on sequence and contextual features [J]. Journal of Computer Applications, 2021, 41(10): 2820-2828.
[8]	LIAO Shenglan, YIN Shi, CHEN Xiaoping, ZHANG Bo, OUYANG Yu, ZHANG Heng. Intent recognition dataset for dialogue systems in power business [J]. Journal of Computer Applications, 2020, 40(9): 2549-2554.
[9]	ZHENG Yanbin, HAN Mengyun, FAN Wenxin. Handwritten Chinese character recognition based on two dimensional principal component analysis and convolutional neural network [J]. Journal of Computer Applications, 2020, 40(8): 2465-2471.
[10]	XU Qiuyan, MA Liang, LIU Yong. Yin-Yang-pair optimization algorithm based on chaos search and intricate operator [J]. Journal of Computer Applications, 2020, 40(8): 2305-2312.
[11]	WANG Yiding, HAO Chenyu, LI Yaoli, CAI Shaoqing, YUAN Yuan. Microscopic image identification for small-sample Chinese medicinal materials powder based on deep learning [J]. Journal of Computer Applications, 2020, 40(5): 1301-1308.
[12]	ZHANG Xiaochuan, DAI Xuyao, LIU Lu, FENG Tianshuo. Chinese short text classification model with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2020, 40(12): 3485-3489.
[13]	WANG Jian, TANG Shan, HUANG Yuxin, YU Zhengtao. Chinese-Vietnamese bilingual multi-document news opinion sentence recognition based on sentence association graph [J]. Journal of Computer Applications, 2020, 40(10): 2845-2849.
[14]	YANG Jian, LI Zhenpeng, SU Peng. Review of speech segmentation and endpoint detection [J]. Journal of Computer Applications, 2020, 40(1): 1-7.
[15]	CHENG Yage, JIA Zhijuan, HU Mingsheng, GONG Bei, WANG Lipeng. Threshold signature scheme suitable for blockchain electronic voting scenes [J]. Journal of Computer Applications, 2019, 39(9): 2629-2635.