This paper describes our participation in the 2022 TREC NeuCLIR challenge. We submitted runs to two out of the three languages (Farsi and Russian), with a focus on first-stage rankers and comparing mono-lingual strategies to Adhoc ones. For monolingual runs, we start from pretraining models on the target language using MLM+FLOPS and then finetuning using the MSMARCO translated to the language either with ColBERT or SPLADE as the retrieval model. While for the Adhoc task, we test both query translation (to the target language) and back-translation of the documents (to English). Initial result analysis shows that the monolingual strategy is strong, but that for the moment Adhoc achieved the best results, with back-translating documents being better than translating queries.
翻译:本文介紹了我們參與 2022 TREC NeuCLIR 挑戰賽的相關工作。我們針對其中的 Farsi 與 Russian 兩種語言提交了結果,主要關注一階段排名器,並比較單語和自適應方案之間的策略性差異。對於單語運行,我們通過 MLM+FLOPS 在目標語言上預訓練模型,然後使用 ColBERT 或 SPLADE 作為檢索模型,使用 MSMARCO 對語言進行微調。對於自適應任務,我們測試了查詢翻譯(到目標語言)和文檔反向翻譯(到英語)兩種方法。初步的結果分析顯示,單語策略效果較好,但目前自適應方法取得了最佳成果,其中文檔的反向翻譯效果優於查詢翻譯。