推动机器翻译大语言模式:案例研究 (Prompting Large Language Model for Machine Translation: A Case Study)

Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study on prompting strategies for translation, examining various factors for prompt template and demonstration example selection. We further explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning in prompting. Extensive experiments with GLM-130B (Zeng et al., 2022) as the testbed show that 1) the number and the quality of prompt examples matter, where using suboptimal examples degenerates translation; 2) several features of prompt examples, such as semantic similarity, show significant Spearman correlation with their prompting performance; yet, none of the correlations are strong enough; 3) using pseudo parallel prompt examples constructed from monolingual data via zero-shot prompting could improve translation; and 4) improved performance is achievable by transferring knowledge from prompt examples selected in other settings. We finally provide an analysis on the model outputs and discuss several problems that prompting still suffers from.

翻译：有关催工的研究显示,在许多任务中,很少甚至没有经过监督的培训,取得了出色的业绩。然而,在文献中尚未充分探讨机器翻译的催工问题。我们填补这一差距的方法是,对加速翻译战略进行系统研究,审查迅速模板和示范范例选择的各种因素。我们进一步探索使用单语数据以及跨语言、跨领域和句到文件的转移学习的可行性,以迅速进行。与GLM-130B(Zeng等人,2022)进行的广泛实验,测试显示:(1) 快速示例的数量和质量,使用亚最佳实例使翻译退化;(2) 快速实例的若干特征,例如语义相似性,表明斯佩曼与其迅速性能的显著相关性;然而,没有一种关联性足够强;(3) 使用以零发提示方式从单语数据中构建的假冒的即时效实例可以改进翻译;(4) 通过从其他环境所选的即时实例中传授知识,提高绩效是能够实现的。我们最后对模型产出进行了分析,并讨论了一些问题仍然受到困扰的问题。