We investigate the problem of simultaneous machine translation of long-form speech content. We target a continuous speech-to-text scenario, generating translated captions for a live audio feed, such as a lecture or play-by-play commentary. As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repeatedly translated from scratch as it grows. This approach naturally exhibits very low latency and high final quality, but at the cost of incremental instability as the output is continuously refined. We experiment with a pipeline of industry-grade speech recognition and translation tools, augmented with simple inference heuristics to improve stability. We use TED Talks as a source of multilingual test data, developing our techniques on English-to-German spoken language translation. Our minimalist approach to simultaneous translation allows us to easily scale our final evaluation to six more target languages, dramatically improving incremental stability for all of them.
翻译:我们研究的是长式语音内容的同步机器翻译问题。 我们的目标是持续语音对文本的假设,为现场录音提供翻译的字幕,例如演讲或逐个播放的评论。 由于这一假设允许对增量翻译进行修改,我们采用了对同步翻译的重新翻译方法,即源头随着其增长而不断从零开始翻译。 这种方法自然显示了非常低的延缓度和高最终质量,但随着产出的不断改进而以递增的不稳定为代价。 我们试验了工业级语音识别和翻译工具的管道,以简单的推论超语来增强稳定性。 我们利用TED会谈作为多语测试数据的来源,开发我们英语到德语口语翻译的技术。 我们的最小翻译方法让我们很容易地将最终评估推广到另外六种目标语言,极大地改善所有这些语言的递增稳定性。