Dialectal Arabic to Modern Standard Arabic (DA-MSA) translation is a challenging task in Machine Translation (MT) due to significant lexical, syntactic, and semantic divergences between Arabic dialects and MSA. Existing automatic evaluation metrics and general-purpose human evaluation frameworks struggle to capture dialect-specific MT errors, hindering progress in translation assessment. This paper introduces Ara-HOPE, a human-centric post-editing evaluation framework designed to systematically address these challenges. The framework includes a five-category error taxonomy and a decision-tree annotation protocol. Through comparative evaluation of three MT systems (Arabic-centric Jais, general-purpose GPT-3.5, and baseline NLLB-200), Ara-HOPE effectively highlights systematic performance differences between these systems. The results show that dialect-specific terminology and semantic preservation remain the most persistent challenges in DA-MSA translation. Ara-HOPE establishes a new framework for evaluating Dialectal Arabic MT quality and provides actionable guidance for improving dialect-aware MT systems.
翻译:方言阿拉伯语到现代标准阿拉伯语(DA-MSA)的翻译是机器翻译(MT)领域的一项挑战性任务,这源于阿拉伯语方言与现代标准阿拉伯语之间显著的词汇、句法和语义差异。现有的自动评估指标和通用人工评估框架难以捕捉方言特有的机器翻译错误,阻碍了翻译评估的进展。本文介绍了Ara-HOPE,一个以人为中心的译后编辑评估框架,旨在系统性地应对这些挑战。该框架包含一个五类错误分类法和一个决策树标注协议。通过对三个机器翻译系统(阿拉伯语中心的Jais、通用GPT-3.5和基线NLLB-200)的比较评估,Ara-HOPE有效地揭示了这些系统之间的系统性性能差异。结果表明,方言特有术语的翻译和语义保持仍然是DA-MSA翻译中最具持续性的挑战。Ara-HOPE为评估方言阿拉伯语机器翻译质量建立了一个新框架,并为改进具有方言感知能力的机器翻译系统提供了可操作的指导。