This paper describes the acquisition, preprocessing, segmentation, and alignment of an Amharic-English parallel corpus. It will be useful for machine translation of an under-resourced language, Amharic. The corpus is larger than previously compiled corpora; it is released for research purposes. We trained neural machine translation and phrase-based statistical machine translation models using the corpus. In the automatic evaluation, neural machine translation models outperform phrase-based statistical machine translation models.
翻译:本文介绍阿姆哈拉-英语平行材料的获取、预处理、分解和校正,对资源不足的语言阿姆哈拉语的机器翻译很有用,该材料比以前汇编的社团大,为研究目的发行,我们用该材料培训神经机翻译和基于词的统计机翻译模型,在自动评估中,神经机翻译模型优于基于字词的统计机翻译模型。