Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (1): 1-7.DOI: 10.11772/j.issn.1001-9081.2019061071
Special Issue: 综述
• Artificial intelligence • Next Articles
YANG Jian, LI Zhenpeng, SU Peng
Received:
2019-06-24
Revised:
2019-09-04
Online:
2019-10-08
Published:
2020-01-10
Supported by:
杨健, 李振鹏, 苏鹏
通讯作者:
杨健
作者简介:
杨健(1976-),男,浙江上虞人,副教授,博士,CCF会员,主要研究方向:语音识别、深度神经网络;李振鹏(1976-),男,辽宁沈阳人,副教授,博士,主要研究方向:应用统计学;苏鹏(1975-),男,山东济南人,副教授,博士,主要研究方向:行为规则挖掘。
基金资助:
CLC Number:
YANG Jian, LI Zhenpeng, SU Peng. Review of speech segmentation and endpoint detection[J]. Journal of Computer Applications, 2020, 40(1): 1-7.
杨健, 李振鹏, 苏鹏. 语音分割与端点检测研究综述[J]. 计算机应用, 2020, 40(1): 1-7.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2019061071
[1] MPORAS I, GANCHEV T, FAKOTAKIS N. Speech segmentation using regression fusion of boundary predictions[J]. Computer Speech and Language, 2010, 24(2):273-288. [2] PATIL H A, PATEL T, TALESARA S, et al. Algorithms for speech segmentation at syllable-level for text-to-speech synthesis system in Gujarati[C]//Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation. Piscataway:IEEE, 2013:1-7. [3] VAN HEMERT J P. Automatic segmentation of speech[J]. IEEE Transactions on Signal Processing, 1991, 39(4):1008-1012. [4] 张扬,赵晓群,王缔罡.基于时频二维能量特征的汉语音节切分方法[J].计算机应用,2016,36(11):3222-3228.(ZHANG Y, ZHAO X Q, WANG D G. Chinese speech segmentation into syllables based on energies in different times and frequencies[J]. Journal of Computer Applications, 2016, 36(11):3222-3228.) [5] 张继勇,郑方,杜术,等.连续汉语语音识别中基于归并的音节切分自动机[J].软件学报,1999,10(11):1212-1215.(ZHANG J Y, ZHENG F, DU S, et al. Merging-based syllables detection automaton in continuous Chinese speech recognition[J]. Journal of Software, 1999, 10(11):1212-1215.) [6] 韩虎.汉语连续语音的音节自动标注算法研究及实现[D].哈尔滨:哈尔滨工业大学,2008:21-44.(HAN H. Research and realization of the automatic syllable marking algorithm for Chinese continuous speech[D]. Harbin:Harbin Institute of Technology, 2008:21-44.) [7] 张扬,赵晓群,王缔罡.基于音节长度高斯拟合的汉语音音节切分方法[J].计算机应用,2016,36(5):1410-1414.(ZHANG Y, ZHAO X Q, WANG D G. Chinese speech segmentation method based on Gauss distribution of time spans of syllables[J]. Journal of Computer Applications, 2016, 36(5):1410-1414.) [8] SEDDIQ Y M, ALOTAIBI Y A, SELOUANI S A. Frame distance array algorithm parameter tune-up for TIMIT corpus automatic speech segmentation[C]//Proceedings of the 2015 IEEE International Conference on Electro/Information Technology. Piscataway:IEEE, 2015:241-245. [9] 李欢欢,王金明,尹海明,等.一种改进的基于Viterbi的语音切分算法[J].通信技术,2015,48(9):1027-1031.(LI H H, WANG J M, YIN H M, et al. An improved speech segmentation algorithm based on Viterbi[J]. Communications Technology, 2015, 48(9):1027-1031.) [10] PANDA S P, NAYAK A K. Automatic speech segmentation in syllable centric speech recognition system[J]. International Journal of Speech Technology, 2016, 19(1):9-18. [11] SARMA B D, SHARMA B, SHANMUGAM S A, et al. Exploration of vowel onset and offset points for hybrid speech segmentation[C]//Proceedings of the 2015 IEEE Region 10 Conference. Piscataway:IEEE, 2015:1-6. [12] BHATI S, NAYAK S, MURTY K S R. Unsupervised segmentation of speech signals using kernel-Gram matrices[C]//Proceedings of the 6th National Conference on Computer Vision, Pattern Recognition, Image Processing, and Graphics, CCIS 841. Singapore:Springer, 2017:139-149. [13] SINCLAIR M, BELL P, BIRCH A, et al. A semi-Markov model for speech segmentation with an utterance-break prior[C]//Proceedings of the 15th Annual Conference of the International Speech Communication Association. Singapore:ISCA, 2014:2351-2355. [14] 卓嘎,边巴旺堆,姜军.双门限算法在藏语语音音节分割中的应用分析[J].科学技术与工程,2015,15(14):196-199,204.(ZHUO G, BIANBA W D, JIANG J. Application analysis of Tibetan syllable segregation with double-threshold algorithm[J]. Science Technology and Engineering, 2015, 15(14):196-199, 204.) [15] 鲁远耀,周妮,肖珂,等.强噪声环境下改进的语音端点检测算法[J].计算机应用,2014,34(5):1386-1390.(LU Y Y, ZHOU N, XIAO K, et al. Improved speech endpoint detection algorithm in strong noise environment[J]. Journal of Computer Applications, 2014, 34(5):1386-1390.) [16] 段淑斐.一种利用多参数进行实时语音边界检测与音节分割算法[J].太原理工大学学报,2009,40(5):487-489,493.(DUAN S F. A real-time border detection and syllable segmentation of voice based on multi-parameter[J]. Journal of Taiyuan University of Technology, 2009, 40(5):487-489, 493.) [17] DEEPAK K T, SARMA B D, PRASANNA S R M. Foreground speech segmentation using zero frequency filtered signal[C]//Proceedings of the 13th Annual Conference of the International Speech Communication Association. Portland, Oregon:ISCA, 2012:1510-1513. [18] KHONGLAH B K, DEEPAK K T, PRASANNA S R M. Indoor/outdoor audio classification using foreground speech segmentation[C]//Proceedings of the 18th Annual Conference of the International Speech Communication Association. Stockholm, Sweden:ISCA, 2017:464-468. [19] FARAJI N, AHADI S M, SHEIKHZADEH H, et al. Speech segmentation using a hypothesis test based on random matrix theory[C]//Proceedings of the 10th IEEE International Symposium on Signal Processing and Information Technology. Piscataway:IEEE, 2010:309-314. [20] MARKLUND E, LACERDA F, SCHWARZ I C, et al. Similarities in fundamental frequency in infant speech segmentation models[C]//Proceedings of the 13th Annual Conference of the International Speech Communication Association. Portland, Oregon:ISCA, 2012:1110-1113. [21] FARAJI N, AHADI S M, SHEIKHZADEH H. Sequential method for speech segmentation based on random matrix theory[J]. IET Signal Processing, 2013, 7(7):625-633. [22] WANG C, ZHAO J, HUANG R. Research on false points removing of speech segmentation[J]. Applied Mechanics and Materials, 2014, 536/537:136-140. [23] GALKA J, ZIOLKO M. Wavelets in speech segmentation[C]//Proceedings of the 14th IEEE Mediterranean Electrotechnical Conference. Piscataway:IEEE, 2008:876-879. [24] CHIT Y W, KHAING S S. Myanmar continuous speech recognition system using fuzzy logic classification in speech segmentation[C]//Proceedings of the 2018 International Conference on Intelligent Information Technology. New York:ACM, 2018:14-17. [25] GHOSH S, SREENIVAS T. Automatic speech segmentation using probabilistic latent component modeling[C]//Proceedings of the 13th Annual Conference of the International Speech Communication Association. Portland, Oregon:ISCA, 2012:2259-2262. [26] ILIYA S, MENZIES D, NERI F, et al. Robust impaired speech segmentation using neural network mixture model[C]//Proceedings of the 2014 IEEE International Symposium on Signal Processing and Information Technology. Piscataway:IEEE, 2014:444-449. [27] STAN A, VALENTINI-BOTINHAO C, ORZA B, et al. Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients[C]//Proceedings of the 2016 IEEE Spoken Language Technology Workshop. Piscataway:IEEE, 2016:597-602. [28] LEOW S J, CHNG E S, LEE C H. Language-resource independent speech segmentation using cues from a spectrogram image[C]//Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2015:5813-5817. [29] BENATI N, BAHI H. Spoken term detection based on acoustic speech segmentation[C]//Proceedings of the 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications. Piscataway:IEEE, 2016:267-271. [30] FANTINATO P C, GUIDO R C, CHEN S H, et al. A fractal-based approach for speech segmentation[C]//Proceedings of the 10th IEEE International Symposium on Multimedia. Piscataway:IEEE, 2008:551-555. [31] 潘峰,丁娜娜,吕鹏,等.基于分形维的语音去噪与音节分割[J].计算机工程与应用,2011,47(14):131-133.(PAN F, DING N N, LYU P, et al. Speech denoising and syllable segmentation based on fractal dimension[J]. Computer Engineering and Applications, 2011, 47(14):131-133.) [32] HE S, ZHAO H. Automatic syllable segmentation algorithm of Chinese speech based on MF-DFA[J]. Speech Communication, 2017, 92:42-51. [33] TOLEDANO D T. Neural network boundary refining for automatic speech segmentation[C]//Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Piscataway:IEEE, 2000:3438-3441. [34] VAN VUUREN V Z, TEN BOSCH L, NIESLER T. Unconstrained speech segmentation using deep neural networks[C]//Proceedings of the 2015 International Conference on Pattern Recognition Applications and Methods. Portugal:SciTePress, 2015:248-254. [35] VAN VUUREN V Z, TEN BOSCH L, NIESLER T. A dynamic programming framework for neural network-based automatic speech segmentation[C]//Proceedings of the 14th Annual Conference of the International Speech Communication Association. Lyon, France:ISCA, 2013:2287-2291. [36] KERI V, PRAHALLAD K. A comparative study of constrained and unconstrained approaches for segmentation of speech signal[C]//Proceedings of the 11th Annual Conference of the International Speech Communication Association. Florence, Italy:ISCA, 2010:2238-2241. [37] LEE Y H, YANG J Y, CHO C, et al. Phoneme segmentation using deep learning for speech synthesis[C]//Proceedings of the 2018 Research in Adaptive and Convergent Systems. New York:ACM, 2018:59-61. [38] AHCōNE A, AISSA A, ABDELKADER D, et al. Automatic segmentation of Arabic speech signals by HMM and ANN[C]//Proceedings of 2016 International Conference on Electrical Sciences and Technologies in Maghreb. Piscataway:IEEE, 2017:1-4. [39] BABY A, PRAKASH J J, VIGNESH R, et al. Deep learning techniques in tandem with signal processing cues for phonetic segmentation for text to speech synthesis in Indian languages[C]//Proceedings of the 18th Annual Conference of the International Speech Communication Association. San Francisco:ISCA, 2017:3817-3821. [40] GRAVES A, MOHAMED A R, HINTON G. Speech recognition with deep recurrent neural networks[C]//Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2013:6645-6649. [41] YANG J, LI Z, SU P. An automatic blind syllable segmentation model based on bi-directional LSTM[C]//Proceedings of the 2nd International Conference on Communication Engineering and Technology. Piscataway:IEEE, 2019:109-113. [42] ABEL A K, HUNTER D, SMITH L S. A biologically inspired onset and offset speech segmentation approach[C]//Proceedings of the 2015 International Joint Conference on Neural Networks. Piscataway:IEEE, 2015:1-8. [43] REKHA J U, CHATRAPATI K S, BABU A V. Game theoretic approach for automatic speech segmentation and recognition[C]//Proceedings of the IEEE 28th Convention of Electrical and Electronics Engineers in Israel. Piscataway:IEEE, 2014:1-5. [44] BROGNAUX S, DRUGMAN T. HMM-based speech segmentation:improvements of fully automatic approaches[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(1):5-15. [45] RÄSÄNEN O J, LAIN U K, ALTOSAAR T. An improved speech segmentation quality measure:the R-value[C]//Proceedings of the 10th Annual Conference of the International Speech Communication Association. Florence, Italy:ISCA, 2009:1851-1854. [46] ESTEVAN Y P, WAN V, SCHARENBORG O. Finding maximum margin segments in speech[C]//Proceedings of the 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2007:IV-937-IV-940. [47] QIAO Y, SHIMOMURA N, MINEMATSU N. Unsupervised optimal phoneme segmentation:Objectives, algorithm and comparisons[C]//Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2008:3989-3992. [48] HU G, WANG D. Auditory segmentation based on onset and offset analysis[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(2):396-405. |
[1] | FANG Xin, HUANG Zexin, ZHANG Yuhan, GAO Tian, PAN Jia, FU Zhonghua, GAO Jianqing, LIU Junhua, ZOU Liang. Semi‑supervised end‑to‑end fake speech detection method based on time‑domain waveforms [J]. Journal of Computer Applications, 2023, 43(1): 227-231. |
[2] | ZHANG Xinhuan, LIU Hongjie, SHI Junqing, MAO Chengyuan, MENG Guolian. LSTM and artificial neural network for urban bus travel time prediction based on spatiotemporal eigenvectors [J]. Journal of Computer Applications, 2021, 41(3): 875-880. |
[3] | ZHANG Yang, ZHAO Xiaoqun, WANG Digang. Chinese speech segmentation method based on Gauss distribution of time spans of syllables [J]. Journal of Computer Applications, 2016, 36(5): 1410-1414. |
[4] | ZHANG Yang, ZHAO Xiaoqun, WANG Digang. Chinese speech segmentation into syllables based on energies in different times and frequencies [J]. Journal of Computer Applications, 2016, 36(11): 3222-3228. |
[5] | ZHANG Bingke, DUAN Xiaogang, DENG Hua. Force estimation in different grasping mode from electromyography [J]. Journal of Computer Applications, 2015, 35(7): 2109-2112. |
[6] | CHEN Shiyu NI Li LYV Zhao WU Xiaopei. Modeling and simulating of train tracking based on multi-Agent theory [J]. Journal of Computer Applications, 2014, 34(5): 1521-1525. |
[7] | ZHANG Ting HE Ling HUANG Hua LIU Xiaoheng. Speech endpoint detection based on critical band and energy entropy [J]. Journal of Computer Applications, 2013, 33(01): 175-178. |
[8] | ZHOU Xiao-meng, XU Xiao-ming. Modified self-organizing map network for Euclidean travelling salesman problem [J]. Journal of Computer Applications, 2012, 32(07): 1962-1964. |
[9] | SHAO Xiaogen SUN Tiankai DING Bin WANG Xingyuan. Image watermarking algorithm based on artificial neural networks classification [J]. Journal of Computer Applications, 2011, 31(06): 1505-1507. |
[10] | . Speech automatic segmentation algorithm of audio publication with adaptive threshold adjustment [J]. Journal of Computer Applications, 2010, 30(2): 567-570. |
[11] | Liao-yu CHANG Xiao-qing YU Wang-gen WAN Chang-lian LI Xue-qiong XU. Research and realization of speech segmentation in MP3 compressed domain [J]. Journal of Computer Applications, 2009, 29(4): 1188-1192. |
[12] | . Application of data confusion algorithm in ZigBee protocal [J]. Journal of Computer Applications, 2009, 29(07): 1897-1900. |
[13] | [英]Huifang Feng [中]冯慧芳 Yantai Shu 舒炎泰. Comparison research on prediction methods for WLAN traffic [J]. Journal of Computer Applications, 2008, 28(11): 2753-2755. |
[14] | Kang-Jie Zhang Huan Zhao Ju-Hua Rao. Adaptive speech pitch detection based on LVAMDF [J]. Journal of Computer Applications, 2007, 27(7): 1674-1676. |
[15] | . Study on noisy speech endpoint detection method [J]. Journal of Computer Applications, 2006, 26(11): 2685-2686. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||