Simultaneous speech translation (SimulST) is the task in which output generation has to be performed on partial, incremental speech input. In recent years, SimulST has become popular due to the spread of cross-lingual application scenarios, like international live conferences and streaming lectures, in which on-the-fly speech translation can facilitate users' access to audio-visual content. In this paper, we analyze the characteristics of the SimulST systems developed so far, discussing their strengths and weaknesses. We then concentrate on the evaluation framework required to properly assess systems' effectiveness. To this end, we raise the need for a broader performance analysis, also including the user experience standpoint. SimulST systems, indeed, should be evaluated not only in terms of quality/latency measures, but also via task-oriented metrics accounting, for instance, for the visualization strategy adopted. In light of this, we highlight which are the goals achieved by the community and what is still missing.
翻译:同时的语音翻译(SimulST)是部分、渐进式语音投入必须进行产出生成的任务。近年来,SimulST由于跨语言应用情景的传播而变得受欢迎,例如国际现场会议和流传讲座,在这种情景中,现场语音翻译可以便利用户获取视听内容。在本文中,我们分析迄今开发的SimulST系统的特点,讨论其优缺点。然后我们集中关注正确评估系统有效性所需的评价框架。为此,我们提出需要更广泛的绩效分析,也包括用户经验观点。事实上,不仅应当从质量/时间尺度上评价SimulST系统,而且应当通过任务导向的计量会计方法来评价,例如,为所采纳的视觉化战略,我们强调哪些是社区所实现的目标,哪些是缺失的。