基于编辑约束的端到端越南语文本正则化方法

• •

基于编辑约束的端到端越南语文本正则化方法

蒋铭¹,王琳钦¹,赖华²,高盛祥¹

1. 昆明理工大学
2. 昆明理工大学信息工程与自动化学院

收稿日期:2024-03-05 修回日期:2024-04-17 发布日期:2024-06-04
通讯作者: 赖华
基金资助:
国家自然科学基金;云南高新技术产业发展项目;云南省重点研发计划;云南省基础研究计划;云南省学术和技术带头人后备人才

An End-to-End Vietnamese Text Normalization Approach Based on Edit Constraints

Received:2024-03-05 Revised:2024-04-17 Online:2024-06-04

摘要/Abstract

摘要： 文本正则化是语音合成文本前端分析任务中不可或缺的步骤。语义歧义性是文本正则化任务面临的主要问题，特别是在非标准词汇，如数字、日期等方面。虽然神经网络系统可以利用上下文解决这些问题，但会产生不可恢复性的错误。因此，本文提出了一种基于编辑约束的端到端文本正则化方法，充分考虑了越南语的语言特点，设计专门用于越南语的标注方法，以提高模型对上下文语义信息的建模能力。同时，本文采用编辑对齐算法，有效地约束非标准词文本的范围，减小解码端搜索空间，从而避免了模型自身局限性所导致的非正则化文本预测错误。实验证明，本研究方法在越南语文本正则化中取得了97%的准确率，并且在中文开源数据集上也取得了显著的效果，验证了该方法在越南语之外的适用性。

关键词: 越南语, 文本正则化, 编辑对齐算法, 语音合成

Abstract: Text normalization is a crucial pre-processing step in text-to-speech synthesis front-end analysis. The Viet-namese language presents challenges related to the semantic ambiguity of non-standard words such as num-bers and dates. Neural text normalization systems can leverage context; however, they suffer from unrecov-erable errors. This study introduces an end-to-end text normalization method grounded in edit constraints. Taking full account of the linguistic characteristics of Vietnamese, we propose a specialized text normalization annotation method for Vietnamese, aiming to enhance the model's contextual semantic information modeling. Additionally, an edit alignment algorithm is applied to effectively restrict the scope of non-standard word text, thereby reducing the search space during decoding. This mitigates text normalization prediction errors arising from inherent model limitations. The experimental results show 97% accuracy in Vietnamese text normali-zation using the proposed method. Moreover, its effectiveness extends to an open-source Chinese dataset, validating the applicability of the method beyond the Vietnamese language.

Key words: Keywords: Vietnamese, Text normalization, Edit alignment algorithm, TTS

蒋铭王琳钦赖华高盛祥. 基于编辑约束的端到端越南语文本正则化方法[J]. 计算机应用.

[1]	鲁超峰, 陶冶, 文连庆, 孟菲, 秦修功, 杜永杰, 田云龙. 融合大语言模型和预训练模型的少量语料说话人-情感语音转换方法[J]. 《计算机应用》唯一官方网站, 2025, 45(3): 815-822.
[2]	蒋铭, 王琳钦, 赖华, 高盛祥. 基于编辑约束的端到端越南语文本正则化方法[J]. 《计算机应用》唯一官方网站, 2025, 45(2): 362-370.
[3]	吴郅昊, 迟子秋, 肖婷, 王喆. 基于元学习自适应的小样本语音合成[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1629-1635.
[4]	赖华, 孙童, 王文君, 余正涛, 高盛祥, 董凌. 多模态特征的越南语语音识别文本标点恢复[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 418-423.
[5]	方昕, 黄泽鑫, 张聿晗, 高天, 潘嘉, 付中华, 高建清, 刘俊华, 邹亮. 基于时域波形的半监督端到端虚假语音检测方法[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 227-231.
[6]	杨健, 李振鹏, 苏鹏. 语音分割与端点检测研究综述[J]. 计算机应用, 2020, 40(1): 1-7.
[7]	邱泽宇, 屈丹, 张连海. 基于WaveNet的端到端语音合成方法[J]. 计算机应用, 2019, 39(5): 1325-1329.
[8]	祖丽皮亚.阿曼艾斯卡尔•艾木都拉地里木拉提•吐尔逊. 维吾尔语三音节词韵律特征声学分析[J]. 计算机应用, 2009, 29(07): 2032-2034.
[9]	张康杰赵欢饶居华. 基于LV-AMDF的自适应基音检测算法研究[J]. 计算机应用, 2007, 27(7): 1674-1676.
[10]	梁春霞张德干姚琳徐凯华. 基于实例推理的人机对话系统的设计与实现[J]. 计算机应用, 2007, 27(3): 765-768.
[11]	王永生，柴佩琪. 英语语音合成中基于有限泛化法的字素切分规则的机器学习[J]. 计算机应用, 2005, 25(09): 2010-2014.