Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Automatic speech segmentation algorithm based on syllable type recognition
Linjia SUN, Lei QIN, Meijin KANG, Yinglin WANG
Journal of Computer Applications    2025, 45 (6): 2034-2042.   DOI: 10.11772/j.issn.1001-9081.2024060748
Abstract26)   HTML2)    PDF (1715KB)(4)       Save

The methods based on boundary detection focus on utilizing abrupt changes in the time and frequency domains rather than language knowledge to segment speech data into syllable units. At the same time, satisfactory segmentation results only be achieved by setting various parameters in these methods, so that the methods still have some drawbacks, such as poor stability, difficulty in parameter adjustment, and weak generalization ability in cross-language environments with a lot of data. To address the above issues, an automatic speech segmentation algorithm based on syllable type recognition was proposed. The characteristic of the proposed algorithm is to recognize syllable type in speech data, not syllable specific content. Firstly, common syllable types of different languages under natural pronunciation were obtained by using linguistic research findings and syllable composition patterns. Then, the acoustic model for each syllable type was established by using the traditional Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM). Moreover, in order to better describe syllable attributes, a channel of feature extraction based on multi-band analysis and significant information fusion was proposed. Finally, based on the sequences of recognized syllable types, Viterbi algorithm was used to determine the speech frames corresponding to the start and end points of syllables. The acoustic models of syllable types were trained by using the speech data from three common languages during experimental phase, and then the recognition experiments were conducted on six languages and dialects. The experimental results show that the average recognition accuracy is over 91.93%; compared with using Mel Frequency cepstral Coefficient (MFCC), using the proposed features can obtain the average recognition accuracy increased by at least 27.16 percentage points; when the tolerance threshold is 20 ms, the average segmentation accuracy of over 90.70% can still be achieved in six languages and dialects; compared with four representative algorithms in recent years, the proposed algorithm has the average segmentation accuracy improved by at least 5.73 percentage points. The above demonstrates that the proposed algorithm has stronger generalization ability, better stability and higher accuracy.

Table and Figures | Reference | Related Articles | Metrics