基于TensorFlow的俄语词汇标音系统

doi:10.11772/j.issn.1001-9081.2017092149

计算机应用 ›› 2018, Vol. 38 ›› Issue (4): 971-977.DOI: 10.11772/j.issn.1001-9081.2017092149

基于TensorFlow的俄语词汇标音系统

冯伟, 易绵竹, 马延周

战略支援部队信息工程大学(洛阳), 河南洛阳 471003

收稿日期:2017-09-04 修回日期:2017-11-18 发布日期:2018-04-09 出版日期:2018-04-10
通讯作者: 易绵竹
作者简介:冯伟(1993-),男,陕西西安人,硕士研究生,主要研究方向:自然语言处理;易绵竹(1964-),男,四川营山人,教授,博士,主要研究方向:计算语言学、语言信息处理;马延周(1977-),男,河南洛阳人,副教授,博士,主要研究方向:计算语言学、语言信息处理。
基金资助:
洛阳市社会科学规划项目（2016B285）。

Russian phonetic transcription system based on TensorFlow

FENG Wei, YI Mianzhu, MA Yanzhou

The PLA Strategic Support Force Information Engineering University(Luoyang), Luoyang Henan 471003, China

Received:2017-09-04 Revised:2017-11-18 Online:2018-04-09 Published:2018-04-10
Supported by:
This work is partially supported by the Project of Social Science Planning of Luoyang (2016B285).

摘要/Abstract

摘要： 针对俄语语音合成和语音识别系统中发音词典规模有限的问题，提出一种基于长短时记忆（LSTM）序列到序列模型的俄语词汇标音算法，同时设计实现了标音原型系统。首先，对基于SAMPA的俄语音素集进行了改进设计，使标音结果能够反映俄语单词的重音位置及元音弱化现象，并依据改进的新音素集构建了包含20 000词的俄语发音词典；然后利用TensorFlow框架实现了这一算法，该算法通过编码LSTM将俄语单词转换为固定维数的向量，再通过解码LSTM将向量转换为目标发音序列；最后，设计实现了具有交互式单词标音等功能的俄语词汇标音系统。实验结果表明，该算法在集外词测试集上的词形正确率达到了74.8%，音素正确率达到了94.5%，均高于Phonetisaurus方法。该系统能够有效为俄语发音词典的构建提供支持。

关键词: 俄语, 词汇标音, 长短时记忆网络, 序列到序列, TensorFlow

Abstract: Focusing on the limited pronunciation dictionary in Russian speech synthesis and speech recognition system, a Russian grapheme-to-phoneme algorithm based on Long Short-Term Memory (LSTM) sequence-to-sequence model was proposed, as well as a phonetic transcription system. Firstly, a new Russian phoneme set based on Speech Assessment Methods Phonetic Alphabet (SAMPA) was designed, making transcription results can reflect the stress position and vowel reduction of Russian words, and a 20 000-word Russian pronunciation dictionary was constructed according to the new phoneme set. Then, the proposed algorithm was implemented by using the TensorFlow framework, in which the Russian word was converted into a fixed-length vector by encoding LSTM, and then the vector was converted into the target pronunciation sequence by decoding LSTM. Finally, the Russian phonetic transcription system was designed and implemented. The experimental results on out-of-vocabulary test set show that the word correct rate reaches 74.8%, and the phoneme correct rate reaches 94.5%, which are higher than those of Phonetisaurus method. The system can effectively support the construction of the Russian pronunciation dictionary.

Key words: Russian, phonetic transcription, Long Short-Term Memory (LSTM), sequence-to-sequence, TensorFlow

中图分类号:

TP391.1

冯伟, 易绵竹, 马延周. 基于TensorFlow的俄语词汇标音系统[J]. 计算机应用, 2018, 38(4): 971-977.

FENG Wei, YI Mianzhu, MA Yanzhou. Russian phonetic transcription system based on TensorFlow[J]. Journal of Computer Applications, 2018, 38(4): 971-977.

参考文献

[1] KARPOV A, MARKOV K, KIPYATKOVA I, et al. Large vocabulary Russian speech recognition using syntactico-statistical language modeling[J]. Speech Communication, 2014, 56(1):213-228.
[2] KIPYATKOVA I, KARPOV A, VERKHODANOVA V, et al. Analysis of long-distance word dependencies and pronunciation variability at conversational Russian speech recognition[J]. Computer Science and Information Systems, 2012, 11(6):719-725.
[3] JIAMPOJAMARN S, KONDRAK G, SHERIF T. Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion[C]//Human Language Technologies:Proceedings of the North American Chapter of the Association of Computational Linguistics. New York:NAACM-HLT, 2007:372-379.
[4] BISANI M, NEY H. Joint-sequence models for grapheme-to-phoneme conversion[J]. Speech Communication, 2008, 50(5):434-451.
[5] NOVAK J R, MINEMATSU N, HIROSE K. WFST-based grapheme-to-phoneme conversion:open source tools for alignment, model-building and decoding[EB/OL].[2017-05-10]. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.9764.
[6] GRAVES A. Generating sequences with recurrent neural networks[EB/OL].[2017-05-10]. https://arxiv.org/pdf/1308.0850.pdf.
[7] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2017-05-10]. https://arxiv.org/abs/1409.0473.
[8] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//NIPS 2014:Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2014, 2:3104-3112.
[9] YAO K, ZWEIG G. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion[EB/OL].[2017-05-10]. https://arxiv.org/abs/1506.00196.
[10] Wikipedia. IPA symbol for Russian pronunciations[EB/OL].[2017-10-17]. https://en.wikipedia.org/wiki/Help:IPA_for_Russian.
[11] WELLS J C. SAMPA computer readable phonetic alphabet[C]//Handbook of Standards and Resources for Spoken Language Systems. Berlin:Walter de Gruyter, 1997.
[12] OTANDER J. CMU sphinx[EB/OL].(2017-04-26)[2017-10-17]. https://cmusphinx.github.io/wiki/download/.
[13] 信德麟,张会森,华劭.俄语语法[M].2版.北京:外语教学与研究出版社, 2009:1-92.(XIN D L, ZHANG H S, HUA S. Russian Grammar(Second Edition)[M]. Beijing:Foreign Language Teaching and Research Press, 2009:1-92.)
[14] 喻俨,莫瑜.深度学习原理与TensorFlow实践[M].北京:电子工业出版社, 2017:128-139.(YU Y, MO Y. Deep Learning Principle and TensorFlow Practice[M]. Beijing:Publishing House of Electronics Industry, 2017:128-139.)
[15] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[16] GIMPEL K, SMITH N A. Softmax-margin CRFs:training log-linear models with cost functions[C]//Human Language Technologies:Proceedings of the North American Chapter of the Association of Computational Linguistics. Los Angeles:DBLP, 2010:733-736.
[17] CHO K, van MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].[2017-05-10]. https://arxiv.org/abs/1406.1078.
[18] KOEHN P. Pharaoh:a beam search decoder for phrase-based statistical machine translation models[C]//AMTA 2004:Proceedings of the 6th Conference of the Association for Machine Translation in the Americas. Berlin:Springer, 2004:115-124.
[19] WILLIAMS R J, PENG J. An efficient gradient-based algorithm for on-line training of recurrent network trajectories[J]. Neural Computation, 1990, 2(4):490-501.
[20] ABADI M, BARHAM P, CHEN J, et al. TensorFlow:a system for large-scale machine learning[C]//Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation. Savannah, GA:USENIX, 2016:265-283.
[21] PETERS B, DEHDARI J, van GENABITH J. Massively multilingual neural grapheme-to-phoneme conversion[C]//Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems. Copenhagen:EMNLP, 2017:19-26.
[22] 滕飞,郑超美,李文.基于长短期记忆多维主题情感倾向性分析模型[J].计算机应用, 2016, 36(8):2252-2256.(TENG F, ZHENG C M, LI W. Multidimensional topic model for oriented sentiment analysis based on long short-term memory[J]. Journal of Computer Applications, 2016, 36(8):2252-2256.)
[23] HANNEMANN M, TRMAL J, ONDEL L, et al. Bayesian joint-sequence models for grapheme-to-phoneme conversion[EB/OL].[2017-05-10]. http://www.fit.vutbr.cz/research/groups/speech/publi/2017/hannemann_icassp2017_0002836.pdf.
[24] TSVETKOV Y, SITARAM S, FARUQUI M, et al. Polyglot neural language models:a case study in cross-lingual phonetic representation learning[EB/OL].[2017-05-10]. https://arxiv.org/abs/1605.03832.
[25] MILDE B, SCHMIDT C, KÖHLER J. Multitask sequence-to-sequence models for grapheme-to-phoneme conversion[EB/OL].[2017-05-10]. http://www.isca-speech.org/archive/Interspeech_2017/pdfs/1436.PDF.

基于TensorFlow的俄语词汇标音系统

Russian phonetic transcription system based on TensorFlow

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	徐泽鑫, 杨磊, 李康顺. 较短的长序列时间序列预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1824-1831.
[2]	林于翔, 吴运兵, 阴爱英, 廖祥文. 基于语义相关性分析的多模态摘要模型[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 65-72.
[3]	刘辉, 马祥, 张琳玉, 何如瑾. 融合匹配长短时记忆网络和语法距离的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 45-50.
[4]	玄英律, 万源, 陈嘉慧. 基于多尺度卷积和注意力机制的LSTM时间序列分类[J]. 《计算机应用》唯一官方网站, 2022, 42(8): 2343-2352.
[5]	左亚尧, 陈皓宇, 陈致然, 洪嘉伟, 陈坤. 融合多语义特征的命名实体识别方法[J]. 《计算机应用》唯一官方网站, 2022, 42(7): 2001-2008.
[6]	孙邱杰, 梁景贵, 李思. 基于BART噪声器的中文语法纠错模型[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 860-866.
[7]	李昕, 贾韬. 基于组蛋白修饰数据预测基因差异性表达的深度融合模型[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3404-3412.
[8]	蔡兴泉, 封丁惟, 王通, 孙辰, 孙海燕. 基于时间注意力机制和EfficientNet的视频暴力行为检测[J]. 《计算机应用》唯一官方网站, 2022, 42(11): 3564-3572.
[9]	陈玉立, 佟强, 谌彤童, 侯守璐, 刘秀磊. 基于注意力机制和生成对抗网络的飞行器短期航迹预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3292-3299.
[10]	屈景怡, 杨柳, 陈旭阳, 王茜. 基于时空序列的Conv-LSTM航班延误预测模型[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3275-3282.
[11]	包银鑫, 曹阳, 施佺. 基于改进时空残差卷积神经网络的城市路网短时交通流预测[J]. 《计算机应用》唯一官方网站, 2022, 42(1): 258-264.
[12]	张永斌, 常文欣, 孙连山, 张航. 基于字典的域名生成算法生成域名的检测方法[J]. 计算机应用, 2021, 41(9): 2609-2614.
[13]	丁尹, 桑楠, 李晓瑜, 吴飞舟. 基于循环神经网络的电信行业容量数据预测方法[J]. 计算机应用, 2021, 41(8): 2373-2378.
[14]	赵小虎, 李晓. 基于多特征提取的图像语义描述算法[J]. 计算机应用, 2021, 41(6): 1640-1646.
[15]	杨丰瑞, 霍娜, 张许红, 韦巍. 基于注意力机制的主题扩展情感对话生成[J]. 计算机应用, 2021, 41(4): 1078-1083.