基于TensorFlow的俄语词汇标音系统

doi:10.11772/j.issn.1001-9081.2017092149

计算机应用 ›› 2018, Vol. 38 ›› Issue (4): 971-977.DOI: 10.11772/j.issn.1001-9081.2017092149

基于TensorFlow的俄语词汇标音系统

冯伟, 易绵竹, 马延周

战略支援部队信息工程大学(洛阳), 河南洛阳 471003

收稿日期:2017-09-04 修回日期:2017-11-18 出版日期:2018-04-10 发布日期:2018-04-09
通讯作者: 易绵竹
作者简介:冯伟(1993-),男,陕西西安人,硕士研究生,主要研究方向:自然语言处理;易绵竹(1964-),男,四川营山人,教授,博士,主要研究方向:计算语言学、语言信息处理;马延周(1977-),男,河南洛阳人,副教授,博士,主要研究方向:计算语言学、语言信息处理。
基金资助:
洛阳市社会科学规划项目（2016B285）。

Russian phonetic transcription system based on TensorFlow

FENG Wei, YI Mianzhu, MA Yanzhou

The PLA Strategic Support Force Information Engineering University(Luoyang), Luoyang Henan 471003, China

Received:2017-09-04 Revised:2017-11-18 Online:2018-04-10 Published:2018-04-09
Supported by:
This work is partially supported by the Project of Social Science Planning of Luoyang (2016B285).

摘要/Abstract

摘要： 针对俄语语音合成和语音识别系统中发音词典规模有限的问题，提出一种基于长短时记忆（LSTM）序列到序列模型的俄语词汇标音算法，同时设计实现了标音原型系统。首先，对基于SAMPA的俄语音素集进行了改进设计，使标音结果能够反映俄语单词的重音位置及元音弱化现象，并依据改进的新音素集构建了包含20 000词的俄语发音词典；然后利用TensorFlow框架实现了这一算法，该算法通过编码LSTM将俄语单词转换为固定维数的向量，再通过解码LSTM将向量转换为目标发音序列；最后，设计实现了具有交互式单词标音等功能的俄语词汇标音系统。实验结果表明，该算法在集外词测试集上的词形正确率达到了74.8%，音素正确率达到了94.5%，均高于Phonetisaurus方法。该系统能够有效为俄语发音词典的构建提供支持。

关键词: 俄语, 词汇标音, 长短时记忆网络, 序列到序列, TensorFlow

Abstract: Focusing on the limited pronunciation dictionary in Russian speech synthesis and speech recognition system, a Russian grapheme-to-phoneme algorithm based on Long Short-Term Memory (LSTM) sequence-to-sequence model was proposed, as well as a phonetic transcription system. Firstly, a new Russian phoneme set based on Speech Assessment Methods Phonetic Alphabet (SAMPA) was designed, making transcription results can reflect the stress position and vowel reduction of Russian words, and a 20 000-word Russian pronunciation dictionary was constructed according to the new phoneme set. Then, the proposed algorithm was implemented by using the TensorFlow framework, in which the Russian word was converted into a fixed-length vector by encoding LSTM, and then the vector was converted into the target pronunciation sequence by decoding LSTM. Finally, the Russian phonetic transcription system was designed and implemented. The experimental results on out-of-vocabulary test set show that the word correct rate reaches 74.8%, and the phoneme correct rate reaches 94.5%, which are higher than those of Phonetisaurus method. The system can effectively support the construction of the Russian pronunciation dictionary.

Key words: Russian, phonetic transcription, Long Short-Term Memory (LSTM), sequence-to-sequence, TensorFlow

中图分类号:

TP391.1

冯伟, 易绵竹, 马延周. 基于TensorFlow的俄语词汇标音系统[J]. 计算机应用, 2018, 38(4): 971-977.

FENG Wei, YI Mianzhu, MA Yanzhou. Russian phonetic transcription system based on TensorFlow[J]. Journal of Computer Applications, 2018, 38(4): 971-977.

参考文献

[1] KARPOV A, MARKOV K, KIPYATKOVA I, et al. Large vocabulary Russian speech recognition using syntactico-statistical language modeling[J]. Speech Communication, 2014, 56(1):213-228.
[2] KIPYATKOVA I, KARPOV A, VERKHODANOVA V, et al. Analysis of long-distance word dependencies and pronunciation variability at conversational Russian speech recognition[J]. Computer Science and Information Systems, 2012, 11(6):719-725.
[3] JIAMPOJAMARN S, KONDRAK G, SHERIF T. Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion[C]//Human Language Technologies:Proceedings of the North American Chapter of the Association of Computational Linguistics. New York:NAACM-HLT, 2007:372-379.
[4] BISANI M, NEY H. Joint-sequence models for grapheme-to-phoneme conversion[J]. Speech Communication, 2008, 50(5):434-451.
[5] NOVAK J R, MINEMATSU N, HIROSE K. WFST-based grapheme-to-phoneme conversion:open source tools for alignment, model-building and decoding[EB/OL].[2017-05-10]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.9764.
[6] GRAVES A. Generating sequences with recurrent neural networks[EB/OL].[2017-05-10]. https://arxiv.org/pdf/1308.0850.pdf.
[7] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2017-05-10]. https://arxiv.org/abs/1409.0473.
[8] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//NIPS 2014:Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2014, 2:3104-3112.
[9] YAO K, ZWEIG G. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion[EB/OL].[2017-05-10]. https://arxiv.org/abs/1506.00196.
[10] Wikipedia. IPA symbol for Russian pronunciations[EB/OL].[2017-10-17]. https://en.wikipedia.org/wiki/Help:IPA_for_Russian.
[11] WELLS J C. SAMPA computer readable phonetic alphabet[C]//Handbook of Standards and Resources for Spoken Language Systems. Berlin:Walter de Gruyter, 1997.
[12] OTANDER J. CMU sphinx[EB/OL].(2017-04-26)[2017-10-17]. https://cmusphinx.github.io/wiki/download/.
[13] 信德麟,张会森,华劭.俄语语法[M].2版.北京:外语教学与研究出版社, 2009:1-92.(XIN D L, ZHANG H S, HUA S. Russian Grammar(Second Edition)[M]. Beijing:Foreign Language Teaching and Research Press, 2009:1-92.)
[14] 喻俨,莫瑜.深度学习原理与TensorFlow实践[M].北京:电子工业出版社, 2017:128-139.(YU Y, MO Y. Deep Learning Principle and TensorFlow Practice[M]. Beijing:Publishing House of Electronics Industry, 2017:128-139.)
[15] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[16] GIMPEL K, SMITH N A. Softmax-margin CRFs:training log-linear models with cost functions[C]//Human Language Technologies:Proceedings of the North American Chapter of the Association of Computational Linguistics. Los Angeles:DBLP, 2010:733-736.
[17] CHO K, van MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].[2017-05-10]. https://arxiv.org/abs/1406.1078.
[18] KOEHN P. Pharaoh:a beam search decoder for phrase-based statistical machine translation models[C]//AMTA 2004:Proceedings of the 6th Conference of the Association for Machine Translation in the Americas. Berlin:Springer, 2004:115-124.
[19] WILLIAMS R J, PENG J. An efficient gradient-based algorithm for on-line training of recurrent network trajectories[J]. Neural Computation, 1990, 2(4):490-501.
[20] ABADI M, BARHAM P, CHEN J, et al. TensorFlow:a system for large-scale machine learning[C]//Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. Savannah, GA:USENIX, 2016:265-283.
[21] PETERS B, DEHDARI J, van GENABITH J. Massively multilingual neural grapheme-to-phoneme conversion[C]//Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems. Copenhagen:EMNLP, 2017:19-26.
[22] 滕飞,郑超美,李文.基于长短期记忆多维主题情感倾向性分析模型[J].计算机应用, 2016, 36(8):2252-2256.(TENG F, ZHENG C M, LI W. Multidimensional topic model for oriented sentiment analysis based on long short-term memory[J]. Journal of Computer Applications, 2016, 36(8):2252-2256.)
[23] HANNEMANN M, TRMAL J, ONDEL L, et al. Bayesian joint-sequence models for grapheme-to-phoneme conversion[EB/OL].[2017-05-10]. http://www.fit.vutbr.cz/research/groups/speech/publi/2017/hannemann_icassp2017_0002836.pdf.
[24] TSVETKOV Y, SITARAM S, FARUQUI M, et al. Polyglot neural language models:a case study in cross-lingual phonetic representation learning[EB/OL].[2017-05-10]. https://arxiv.org/abs/1605.03832.
[25] MILDE B, SCHMIDT C, KÖHLER J. Multitask sequence-to-sequence models for grapheme-to-phoneme conversion[EB/OL].[2017-05-10]. http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1436.PDF.

基于TensorFlow的俄语词汇标音系统

Russian phonetic transcription system based on TensorFlow

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 14

编辑推荐

Metrics

[1]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[2]	丁尹, 桑楠, 李晓瑜, 吴飞舟. 基于循环神经网络的电信行业容量数据预测方法[J]. 计算机应用, 2021, 41(8): 2373-2378.
[3]	赵小虎, 李晓. 基于多特征提取的图像语义描述算法[J]. 计算机应用, 2021, 41(6): 1640-1646.
[4]	杨丰瑞, 霍娜, 张许红, 韦巍. 基于注意力机制的主题扩展情感对话生成[J]. 计算机应用, 2021, 41(4): 1078-1083.
[5]	邱宁佳, 王晓霞, 王鹏, 王艳春. 融合语法规则的双通道中文情感模型分析[J]. 计算机应用, 2021, 41(2): 318-323.
[6]	周玉彬, 肖红, 王涛, 姜文超, 熊梦, 贺忠堂. 基于动作周期退化相似性度量的机械轴健康指标构建与剩余寿命预测[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3192-3199.
[7]	董永峰, 刘超, 王利琴, 李英双. 融合多跳关系路径信息的关系推理方法[J]. 计算机应用, 2021, 41(10): 2799-2805.
[8]	马停停, 冀天娇, 杨冠羽, 陈阳, 许文波, 刘宏图. 基于长短时记忆神经网络的手足口病发病趋势预测[J]. 计算机应用, 2021, 41(1): 265-269.
[9]	戎炜, 蒋哲远, 谢昭, 吴克伟. 基于聚类关联网络的群组行为识别[J]. 计算机应用, 2020, 40(9): 2507-2513.
[10]	傅洪亮, 雷沛之. 基于去噪自编码器和长短时记忆网络的语音测谎算法[J]. 计算机应用, 2020, 40(2): 589-594.
[11]	闻畅, 刘宇, 顾进广. 基于注意力机制的双向长短时记忆网络模型突发事件演化关系抽取[J]. 计算机应用, 2019, 39(6): 1646-1651.
[12]	郑毅, 李凤, 张丽, 刘守印. 基于长短时记忆网络的人体姿态检测方法[J]. 计算机应用, 2018, 38(6): 1568-1574.
[13]	李雅昆, 潘晴, Everett X. WANG. 基于改进的多层BLSTM的中文分词和标点预测[J]. 计算机应用, 2018, 38(5): 1278-1282.
[14]	赵宇晴, 向阳. 基于分层编码的深度增强学习对话生成[J]. 计算机应用, 2017, 37(10): 2813-2818.