This paper presents our system for SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval. In an era where misinformation spreads rapidly, effective fact-checking is increasingly critical. We introduce TriAligner, a novel approach that leverages a dual-encoder architecture with contrastive learning and incorporates both native and English translations across different modalities. Our method effectively retrieves claims across multiple languages by learning the relative importance of different sources in alignment. To enhance robustness, we employ efficient data preprocessing and augmentation using large language models while incorporating hard negative sampling to improve representation learning. We evaluate our approach on monolingual and crosslingual benchmarks, demonstrating significant improvements in retrieval accuracy and fact-checking performance over baselines.
翻译:本文介绍了我们为SemEval-2025任务7(多语言与跨语言事实核查声明检索)所开发的系统。在错误信息快速传播的时代,有效的事实核查变得日益关键。我们提出了TriAligner——一种新颖的方法,该方法采用具有对比学习的双编码器架构,并整合了跨不同模态的原文与英语翻译版本。我们的方法通过学习不同对齐源之间的相对重要性,实现了跨多语言的高效声明检索。为增强鲁棒性,我们采用基于大语言模型的高效数据预处理与增强技术,同时结合困难负样本来改进表征学习。我们在单语言与跨语言基准上评估了该方法,结果表明其在检索准确率和事实核查性能方面较基线模型均有显著提升。