GHaLIB：面向低资源语言希望言论检测的多语言框架 (GHaLIB: A Multilingual Framework for Hope Speech Detection in Low-Resource Languages)

Hope speech has been relatively underrepresented in Natural Language Processing (NLP). Current studies are largely focused on English, which has resulted in a lack of resources for low-resource languages such as Urdu. As a result, the creation of tools that facilitate positive online communication remains limited. Although transformer-based architectures have proven to be effective in detecting hate and offensive speech, little has been done to apply them to hope speech or, more generally, to test them across a variety of linguistic settings. This paper presents a multilingual framework for hope speech detection with a focus on Urdu. Using pretrained transformer models such as XLM-RoBERTa, mBERT, EuroBERT, and UrduBERT, we apply simple preprocessing and train classifiers for improved results. Evaluations on the PolyHope-M 2025 benchmark demonstrate strong performance, achieving F1-scores of 95.2% for Urdu binary classification and 65.2% for Urdu multi-class classification, with similarly competitive results in Spanish, German, and English. These results highlight the possibility of implementing existing multilingual models in low-resource environments, thus making it easier to identify hope speech and helping to build a more constructive digital discourse.

翻译：希望言论在自然语言处理（NLP）领域中的研究相对不足。当前的研究主要集中在英语上，这导致乌尔都语等低资源语言缺乏相关资源。因此，促进积极在线交流的工具开发仍然有限。尽管基于Transformer的架构已被证明在检测仇恨和冒犯性言论方面是有效的，但将其应用于希望言论检测，或更广泛地，在不同语言环境中进行测试的工作还很少。本文提出了一个专注于乌尔都语的多语言希望言论检测框架。我们使用XLM-RoBERTa、mBERT、EuroBERT和UrduBERT等预训练的Transformer模型，应用简单的预处理并训练分类器以获得改进的结果。在PolyHope-M 2025基准测试上的评估显示出强劲的性能，乌尔都语二分类任务的F1分数达到95.2%，多分类任务达到65.2%，在西班牙语、德语和英语上也取得了同样具有竞争力的结果。这些结果突显了在低资源环境中部署现有多语言模型的可能性，从而更容易识别希望言论，并有助于构建更具建设性的数字话语。