Grammatical Error Correction (GEC) is an important aspect of natural language processing. Arabic has a complicated morphological and syntactic structure, posing a greater challenge than other languages. Even though modern neural models have improved greatly in recent years, the majority of previous attempts used individual models without taking into account the potential benefits of combining different systems. In this paper, we present one of the first multi-system approaches for correcting grammatical errors in Arabic, the Arab Enhanced Edit Selection System Complication (ArbESC+). Several models are used to collect correction proposals, which are represented as numerical features in the framework. A classifier determines and implements the appropriate corrections based on these features. In order to improve output quality, the framework uses support techniques to filter overlapping corrections and estimate decision reliability. A combination of AraT5, ByT5, mT5, AraBART, AraBART+Morph+GEC, and Text editing systems gave better results than a single model alone, with F0.5 at 82.63% on QALB-14 test data, 84.64% on QALB-15 L1 data, and 65.55% on QALB-15 L2 data. As one of the most significant contributions of this work, it's the first Arab attempt to integrate linguistic error correction. Improving existing models provides a practical step towards developing advanced tools that will benefit users and researchers of Arabic text processing.
翻译:语法错误校正(GEC)是自然语言处理领域的重要研究方向。阿拉伯语具有复杂的形态与句法结构,相较于其他语言面临更大挑战。尽管近年来神经模型已取得显著进展,但先前研究大多采用单一模型,未能充分利用多系统组合的潜在优势。本文提出一种面向阿拉伯语语法错误校正的首个多系统组合方法——阿拉伯增强型编辑选择系统组合框架(ArbESC+)。该框架集成多个模型以收集校正建议,并将其表示为数值特征。基于这些特征,分类器判定并实施恰当的校正。为提升输出质量,框架采用辅助技术过滤重叠校正并评估决策可靠性。通过结合AraT5、ByT5、mT5、AraBART、AraBART+Morph+GEC及文本编辑系统,该方法在QALB-14测试数据上取得82.63%的F0.5值,在QALB-15 L1数据上达84.64%,在QALB-15 L2数据上为65.55%,性能优于单一模型。作为本研究的重要贡献之一,这是首次在阿拉伯语领域实现语言错误校正的系统集成。通过改进现有模型,该工作为开发先进的阿拉伯语文本处理工具迈出关键一步,将为用户与研究者提供实质性助力。