Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (2): 362-370.DOI: 10.11772/j.issn.1001-9081.2024020232

• Artificial intelligence • Previous Articles    

End-to-end Vietnamese text normalization method based on editing constraints

Ming JIANG1,2, Linqin WANG1,2, Hua LAI1,2(), Shengxiang GAO1,2   

  1. 1.Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming Yunnan 650504,China
    2.Key Laboratory of Artificial Intelligence in Yunnan Province (Kunming University of Science and Technology),Kunming Yunnan 650500,China
  • Received:2024-03-05 Revised:2024-04-17 Accepted:2024-04-25 Online:2025-02-24 Published:2025-02-10
  • Contact: Hua LAI
  • About author:JIANG Ming, born in 1997, M. S. candidate. His research interests include information retrieval, text-to-speech.
    WANG Linqin, born in 1995, Ph. D. candidate. His research interests include machine translation, text-to-speech.
    GAO Shengxiang, born in 1977, Ph. D., associate professor. Her research interests include machine translation, information retrieval, text-to-speech.
  • Supported by:
    National Natural Science Foundation of China(62376111);Yunnan High-tech Industry Development Project(201606);Yunnan Province Key Research and Development Program(202302AD080003);Yunnan Basic Research Program(202001AS070014);Reserve Talents Program of Yunnan Province Academic and Technical Leaders(202105AC160018)

基于编辑约束的端到端越南语文本正则化方法

蒋铭1,2, 王琳钦1,2, 赖华1,2(), 高盛祥1,2   

  1. 1.昆明理工大学 信息工程与自动化学院,昆明 650504
    2.云南省人工智能重点实验室(昆明理工大学),昆明 650500
  • 通讯作者: 赖华
  • 作者简介:蒋铭(1997—),男,四川资阳人,硕士研究生,主要研究方向:信息检索、语音合成
    王琳钦(1995—),男,云南曲靖人,博士研究生,主要研究方向:机器翻译、语音合成
    高盛祥(1977—),女,云南大理人,副教授,博士,CCF会员,主要研究方向:机器翻译、信息检索、语音合成。
  • 基金资助:
    国家自然科学基金资助项目(62376111);云南高新技术产业发展项目(201606);云南省重点研发计划项目(202302AD080003);云南省基础研究计划项目(202001AS070014);云南省学术和技术带头人后备人才计划项目(202105AC160018)

Abstract:

Text normalization is considered an indispensable step in frontend analysis task of Text-To-Speech (TTS), and semantic ambiguity is the main challenge faced by text normalization tasks, particularly semantic ambiguity of non-standard words such as numbers, dates, and time. Aiming at the problem, an editing constraint-based end-to-end text normalization method was proposed, and after fully considering linguistic characteristics of Vietnamese, a specialized labelling method was designed for Vietnamese to enhance the model’s modeling capability of contextual semantic information. Furthermore, addressing the issue of irreparable errors generated by neural network models easily, an editing alignment algorithm was proposed to constrain the scope of non-standard word text effectively, thereby reducing search space at the decoding end and avoiding prediction errors of non-normalized text caused by limitations of the model itself. With FastCorrect model selected as the baseline model, various optimization methods were applied to the model to obtain new models. Experimental results indicate that the proposed model achieves a 23.71 percentage point increase in precision compared to the baseline model using unlabeled data in Vietnamese experiments of different optimization methods, and a 26.24 percentage point increase in precision in similar Chinese experiments. It can be observed that the method not only performs well in Vietnamese but also demonstrates significant effects on Chinese open-source data, confirming its applicability beyond Vietnamese. Moreover, the model using the proposed method surpasses six baseline models with an precision of 97.14% and outperforms the Weighted Finite-State Transducer (WFST) two-stage method by 2.29 percentage points in F1-score, verifying superiority of the proposed method in text normalization tasks.

Key words: Vietnamese, text normalization, editing alignment algorithm, Text-To-Speech (TTS), end-to-end

摘要:

文本正则化是语音合成(TTS)前端分析任务中不可或缺的步骤,而语义歧义性是文本正则化任务面临的主要问题,比如数字、日期、时间等非标准词的语义歧义性。针对该问题,提出一种基于编辑约束的端到端文本正则化方法,并且在充分考虑越南语的语言特点后,设计专门用于越南语的标注方法,以提高模型对上下文语义信息的建模能力。此外,针对神经网络模型容易产生不可恢复性错误的问题,提出一种编辑对齐算法以有效约束非标准词文本的范围,减小解码端的搜索空间,从而避免模型自身局限性所导致的非正则化文本预测错误。选取FastCorrect模型作为基准模型,将各类优化方法应用到基准模型中得到新模型。实验结果表明,所提模型在越南语不同优化方式的对比实验中的精准率相比使用无标注数据的基准模型提高了23.71个百分点,在同类中文实验中的精准率提高了26.24个百分点。可见,所提方法不仅在越南语上表现出色,而且在中文开源数据上也取得了显著的效果,验证了该方法在越南语之外的适用性。而且,与六类基线模型相比,使用所提方法的模型取得了最高的97.14%的精准率,在F1值上超过加权有限状态转换器(WFST)的两阶段方法2.29个百分点,证明了所提方法在文本正则化任务上的优越性。

关键词: 越南语, 文本正则化, 编辑对齐算法, 语音合成, 端到端

CLC Number: