Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Review of optimization methods for end-to-end speech-to-speech translation

Wei ZONG, Yue ZHAO, Yin LI, Xiaona XU

Journal of Computer Applications 2025, 45 (5): 1363-1371. DOI: 10.11772/j.issn.1001-9081.2024050666

Abstract （101）

HTML （12）

PDF （2566KB）（81）

Save

Speech-to-Speech Translation （S2ST） is an emerging research direction in intelligent speech field， aiming to seamlessly translate spoken language from one language into another language. With increasing demands for cross-linguistic communication， S2ST has garnered significant attention， driving continuous research. Traditional cascaded models face numerous challenges in S2ST， including error propagation， inference latency， and inability to translate languages without a writing system. To address these issues， achieving direct S2ST using end-to-end models has become a key research focus. Based on a comprehensive survey of end-to-end S2ST models， a detailed analysis and summary of various end-to-end S2ST models was provided， the existing related technologies were reviewed， and the challenges were summarized into three categories： modeling burden， data scarcity， and real-world application， with a focus on how existing work has addressed these three categories. The extensive comprehension and generative capabilities of Large Language Models （LLMs） offer new possibilities for S2ST， while simultaneously presenting additional challenges. Exploring effective applications of LLMs in S2ST was also discussed， and potential future development directions were looked forward.

Table and Figures | Reference | Related Articles | Metrics