Computer-aided translation (CAT), the use of software to assist a human translator in the translation process, has been proven to be useful in enhancing the productivity of human translators. Autocompletion, which suggests translation results according to the text pieces provided by human translators, is a core function of CAT. There are two limitations in previous research in this line. First, most research works on this topic focus on sentence-level autocompletion (i.e., generating the whole translation as a sentence based on human input), but word-level autocompletion is under-explored so far. Second, almost no public benchmarks are available for the autocompletion task of CAT. This might be among the reasons why research progress in CAT is much slower compared to automatic MT. In this paper, we propose the task of general word-level autocompletion (GWLAN) from a real-world CAT scenario, and construct the first public benchmark to facilitate research in this topic. In addition, we propose an effective method for GWLAN and compare it with several strong baselines. Experiments demonstrate that our proposed method can give significantly more accurate predictions than the baseline methods on our benchmark datasets.
翻译:计算机辅助翻译(CAT)是用于在翻译过程中协助一名翻译的软件,事实证明,这种软件的使用对于提高翻译的生产率是有用的。自动完成(根据由翻译提供的文本显示翻译结果)是CAT的一项核心职能。在先前的这一行的研究中,有两个局限性。首先,关于这一专题的大多数研究工作侧重于判决一级的自动完成(即产生整个翻译,作为以人的投入为基础的句子),但字级自动完成目前尚未得到充分探讨。第二,几乎没有关于计算机辅助翻译自动完成任务的公开基准。这可能是为什么与自动MT相比,计算机辅助翻译的研究进展要慢得多的原因之一。在本文件中,我们建议从现实世界的CAT情景中完成一般的字级自动完成(GWLAN)的任务,并构建第一个公共基准,以便利这一专题的研究。此外,我们为GWLAN提出了一个有效的方法,并将其与几个强有力的基准进行比较。实验表明,我们提出的方法可以提供比我们基准数据集的基准方法更准确得多的预测。