Improving machine simultaneous interpretation by punctuation recovery

doi:10.11772/j.issn.1001-9081.2019101711

Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (4): 972-977.DOI: 10.11772/j.issn.1001-9081.2019101711

• Artificial intelligence • Previous Articles Next Articles

Improving machine simultaneous interpretation by punctuation recovery

CHEN Yuna, SHI Xiaodong

School of Informatics, Xiamen University, Xiamen Fujian 361005, China

Received:2019-10-11 Revised:2019-12-02 Online:2020-04-17 Published:2020-04-10
Supported by:
This work is partially supported by the Key Project of National Social Science Foundation of China (16AZD049),the Language Research Project Outstanding Achievement Late Fund of the National Language Commission of China(WT135-38).

通过标点恢复提高机器同传效果

陈玉娜, 史晓东

厦门大学信息学院, 福建厦门 361005

通讯作者: 史晓东
作者简介:陈玉娜(1995-),女,福建泉州人,硕士研究生,主要研究方向:自然语言处理、机器翻译;史晓东(1966-),男,江苏江阴人,教授,博士,CCF会员,主要研究方向:自然语言处理、机器翻译、人工智能。
基金资助:
国家社会科学基金重点项目（16AZD049）；国家语委语言文字科研项目优秀成果后期资助计划项目（WT135-38）。

Abstract

Abstract: In the Machine Simultaneous Interpretation(MSI)pipeline system,semantic incompleteness occurs when the Automatic Speech Recognition(ASR)outputs are directly input into Neural Machine Translation(NMT). To address this problem,a model based on Bidirectional Encoder Representation from Transformers (BERT) and Focal Loss was proposed. Firstly,several segments generated by the ASR system were cached and formed into a string. Then a BERT-based sequence labeling model was used to recover the punctuations of the string,and Focal Loss was used as the loss function in the process of model training to alleviate the class imbalance problem of more unpunctuated samples than punctuated samples. Finally,the punctuation-restored string was input into NMT. Experimental results on English-German and Chinese-English translation show that in term of translation quality,the MSI using the proposed punctuation recovery model has the improvement of 8. 19 BLEU and 4. 24 BLEU respectively compared with the MSI with ASR outputs directly inputting into NMT,and has the improvement of 2. 28 BLEU and 3. 66 BLEU respectively compared with the MSI using punctuation recovery model based on bi-directional recurrent neural network with attention mechanism. Therefore,the proposed model can be effectively applied to MSI.

Key words: Machine Simultaneous Interpretation (MSI), punctuation recovery, Focal Loss, Automatic Speech Recognition (ASR), pretrained language model

摘要： 在机器同传（MSI）流水线系统中，将自动语音识别（ASR）的输出直接输入神经机器翻译（NMT）中会产生语义不完整问题，为解决该问题，提出基于BERT（Bidirectional Encoder Representation from Transformers）和Focal Loss的模型。首先，将ASR系统生成的几个片段缓存并组成一个词串；然后，使用基于BERT的序列标注模型恢复该词串的标点符号，并利用Focal Loss作为模型训练过程中的损失函数来缓解无标点样本比有标点样本多的类别不平衡问题；最后，将标点恢复后的词串输入NMT中。在英-德和汉-英翻译上的实验结果表明，在翻译质量上，使用提出的标点恢复模型的MSI，比将ASR输出直接输入NMT的MSI分别提高了8.19 BLEU和4.24 BLEU，比使用基于注意力机制的双向循环神经网络标点恢复模型的MSI分别提高了2.28 BLEU和3.66 BLEU。因此所提模型可以有效应用于MSI中。

关键词: 机器同传, 标点恢复, Focal Loss, 自动语音识别, 预训练语言模型

CLC Number:

TP391.1

CHEN Yuna, SHI Xiaodong. Improving machine simultaneous interpretation by punctuation recovery[J]. Journal of Computer Applications, 2020, 40(4): 972-977.

陈玉娜, 史晓东. 通过标点恢复提高机器同传效果[J]. 计算机应用, 2020, 40(4): 972-977.

References

[1] MIAO Y,GOWAYYED M,METZE F,et al. EESEN:end-to-end speech recognition using deep RNN models and WFST-based decoding[C]//Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway:IEEE,2015:167-174.
[2] WATANABE S,HORI T,KARITA S,et al. ESPnet:end-to-end speech processing toolkit[EB/OL].[2019-05-17]. https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1456.pdf.
[3] POVEY D,GHOSHAL A,BOULIANNE G,et al. The Kaldi speech recognition toolkit[EB/OL].[2019-05-17]. http://rmozone.com/snapshots/2015/07/cdg-room-refs/2011_asru_kaldi.pdf.
[4] BAHDANAU D,CHO K,BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL].[2019-05-19]. https://arxiv.org/pdf/1409.0473.pdf.
[5] KALCHBRENNER N, ESPEHOLT L, SIMONYAN K, et al. Neural machine translation in linear time[EB/OL].[2019-05-17]. https://arxiv.org/pdf/1610.10099.pdf.
[6] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]//Proceedings of the 31st Annual Conference on Neural Information Processing Systems. New York:Curran Associates Inc.,2017:6000-6010.
[7] LEVY T,SILBER-VAROD V,MOYAL A. The effect of pitch, intensity and pause duration in punctuation detection[C]//Proceedings of the IEEE 27th Convention of Electrical and Electronics Engineers in Israel. Piscataway:IEEE,2012:1-4.
[8] CHO E,NIEHUES J,WAIBEL A. NMT-based segmentation and punctuation insertion for real-time spoken language translation[EB/OL].[2019-12-02]. https://www.isca-speech.org/archive/Interspeech_2017/pdfs/1320.PDF.
[9] YI J,TAO J. Self-attention based model for punctuation prediction using word and speech embeddings[C]//Proceedings of the 2019 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE,2019:7270-7274.
[10] CHE X,WANG C,YANG H,et al. Punctuation prediction for unsegmented transcript based on word vector[C]//Proceedings of the 10th International Conference on Language Resources and Evaluation. Stroudsburg, PA:Association for Computational Linguistics,2016:654-658.
[11] TIKL O,ALUMÄE T. LSTM for punctuation restoration in speech transcripts[EB/OL].[2019-05-19]. https://www.isca-speech.org/archive/interspeech_2015/papers/i15_0683.pdf.
[12] TILK O,ALUMÄE T. Bidirectional recurrent neural network with attention mechanism for punctuation restoration[EB/OL].[2019-05-17]. https://www.isca-speech.org/archive/Interspeech_2016/pdfs/1517.PDF.
[13] 李雅昆, 潘晴,WANG E X. 基于改进的多层BLSTM的中文分词和标点预测[J]. 计算机应用,2018,38(5):1278-1282, 1314. (LI Y K,PAN Q,WANG E X. Joint Chinese word segmentation and punctuation prediction based on improved multilayer BLSTM network[J]. Journal of Computer Applications, 2018,38(5):1278-1282,1314.).
[14] CHO E,HA T L,WAIBEL A. CRF-based disfluency detection using semantic features for German to English spoken language translation[EB/OL].[2019-05-19]. http://www.mt-archive.info/10/IWSLT-2013-Cho.pdf.
[15] ZAYATS V, OSTENDORF M, HAJISHIRZI H. Disfluency detection using a bidirectional LSTM[EB/OL].[2019-05-19]. https://www.isca-speech.org/archive/Interspeech_2016/pdfs/1247.PDF.
[16] LOU P J,ANDERSON P,JOHNSON M. Disfluency detection using auto-correlational neural networks[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics,2018:4610-4619.
[17] SARMA A,PALMER D D. Context-based speech recognition error detection and correction[C]//Proceedings of the 2004 Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg,PA:Association for Computational Linguistics,2004:85-88.
[18] GUO J,SAINATH T N,WEISS R J. A spelling correction model for end-to-end speech recognition[C]//Proceedings of the 2019 IEEE International Conference on Acoustics,Speech and Signal Processing. Piscataway:IEEE,2019:5651-5655.
[19] XIE Z,AVATI A,ARIVAZHAGAN N,et al. Neural language correction with character-based attention[EB/OL].[2019-05-19]. https://arxiv.org/pdf/1603.09727.pdf.
[20] DEVLIN J,CHANG M,LEE K,et al. BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg,PA:Association for Computational Linguistics,2019:4171-4186.
[21] LIN T Y,GOYAL P,GIRSHICK R,et al. Focal Loss for dense object detection[C]//Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE,2017:2999-3007.
[22] PAPINENI K,ROUKOS S,WARD T,et al. BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA:Association for Computational Linguistics,2002:311-318.

Improving machine simultaneous interpretation by punctuation recovery

通过标点恢复提高机器同传效果

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 9

Recommended Articles

Metrics

[1]	Qianhui LU, Yu ZHANG, Mengling WANG, Tingwei WU, Yuzhong SHAN. Classification model of nuclear power equipment quality text based on improved recurrent pooling network [J]. Journal of Computer Applications, 2024, 44(7): 2034-2040.
[2]	Hailong CHEN, Chang YANG, Mei DU, Yingyu ZHANG. Credit risk prediction model based on borderline adaptive SMOTE and Focal Loss improved LightGBM [J]. Journal of Computer Applications, 2022, 42(7): 2256-2264.
[3]	Xiaopeng WANG, Yuanyuan SUN, Hongfei LIN. Encoding-decoding relationship extraction model based on criminal Electra [J]. Journal of Computer Applications, 2022, 42(1): 87-93.
[4]	Chuang GAO, Mian TANG, Liang ZHAO. B-cell epitope prediction model with overlapping subgraph mining based on L-Metric [J]. Journal of Computer Applications, 2021, 41(12): 3702-3706.
[5]	JIANG Jinhong, BAO Shengli, SHI Wenxu, WEI Zhenkun. Improved traffic sign recognition algorithm based on YOLO v3 algorithm [J]. Journal of Computer Applications, 2020, 40(8): 2472-2478.
[6]	ZHANG Kailin, YAN Qing, XIA Yi, ZHANG Jun, DING Yun. Semi-supervised hyperspectral image classification based on focal loss [J]. Journal of Computer Applications, 2020, 40(4): 1030-1037.
[7]	WANG Yulong, PU Jun, ZHAO Jianghua, LI Jianhui. Detection of new ground buildings based on generative adversarial network [J]. Journal of Computer Applications, 2019, 39(5): 1518-1522.
[8]	XU Zihao, HUANG Weiquan, WANG Yin. Multi-class vehicle detection in surveillance video based on deep learning [J]. Journal of Computer Applications, 2019, 39(3): 700-705.
[9]	LIU Jingang, ZHOU Yi, MA Yongbao, LIU Hongqing. Estimation algorithm of switching speech power spectrum for automatic speech recognition system [J]. Journal of Computer Applications, 2016, 36(12): 3369-3373.