Hope speech has been relatively underrepresented in Natural Language Processing (NLP). Current studies are largely focused on English, which has resulted in a lack of resources for low-resource languages such as Urdu. As a result, the creation of tools that facilitate positive online communication remains limited. Although transformer-based architectures have proven to be effective in detecting hate and offensive speech, little has been done to apply them to hope speech or, more generally, to test them across a variety of linguistic settings. This paper presents a multilingual framework for hope speech detection with a focus on Urdu. Using pretrained transformer models such as XLM-RoBERTa, mBERT, EuroBERT, and UrduBERT, we apply simple preprocessing and train classifiers for improved results. Evaluations on the PolyHope-M 2025 benchmark demonstrate strong performance, achieving F1-scores of 95.2% for Urdu binary classification and 65.2% for Urdu multi-class classification, with similarly competitive results in Spanish, German, and English. These results highlight the possibility of implementing existing multilingual models in low-resource environments, thus making it easier to identify hope speech and helping to build a more constructive digital discourse.
翻译:希望言论在自然语言处理(NLP)领域中的研究相对不足。当前的研究主要集中在英语上,这导致乌尔都语等低资源语言缺乏相关资源。因此,促进积极在线交流的工具开发仍然有限。尽管基于Transformer的架构已被证明在检测仇恨和冒犯性言论方面是有效的,但将其应用于希望言论检测,或更广泛地,在不同语言环境中进行测试的工作还很少。本文提出了一个专注于乌尔都语的多语言希望言论检测框架。我们使用XLM-RoBERTa、mBERT、EuroBERT和UrduBERT等预训练的Transformer模型,应用简单的预处理并训练分类器以获得改进的结果。在PolyHope-M 2025基准测试上的评估显示出强劲的性能,乌尔都语二分类任务的F1分数达到95.2%,多分类任务达到65.2%,在西班牙语、德语和英语上也取得了同样具有竞争力的结果。这些结果突显了在低资源环境中部署现有多语言模型的可能性,从而更容易识别希望言论,并有助于构建更具建设性的数字话语。