Language models (LMs) have proven surprisingly successful at capturing factual knowledge by completing cloze-style fill-in-the-blank questions such as "Punta Cana is located in _." However, while knowledge is both written and queried in many languages, studies on LMs' factual representation ability have almost invariably been performed on English. To assess factual knowledge retrieval in LMs in different languages, we create a multilingual benchmark of cloze-style probes for \langnum typologically diverse languages. To properly handle language variations, we expand probing methods from single- to multi-word entities, and develop several decoding algorithms to generate multi-token predictions. Extensive experimental results provide insights about how well (or poorly) current state-of-the-art LMs perform at this task in languages with more or fewer available resources. We further propose a code-switching-based method to improve the ability of multilingual LMs to access knowledge, and verify its effectiveness on several benchmark languages. Benchmark data and code have been released at https://x-factr.github.io.
翻译:语言模型(LMs)通过完成“Punta Cana 位于_”等“Punta Cana 位于_”等 Cluze-sty-fill-the-blank 问题,在获取事实知识方面取得了惊人的成功。然而,虽然知识是用多种语言写成和询问的,但几乎总是用英语对LMs的事实陈述能力进行研究。为了用不同语言评估LMs的事实知识检索,我们为clannum 类型多样的语言创建了一个多语种的Cluze-sty 探测器的多语种基准。为了正确处理语言变异,我们扩大了从单词到多词种语言实体的探测方法,并开发了几种解码算法,以产生多调的预测。广泛的实验结果提供了对目前LMs 以不同语言以多少或更少的资源完成这项任务的(或差)最新LMs 状态的洞察力。我们进一步提议了一个基于代码开法的方法,以提高多语种LMs 获取知识的能力,并核实其在若干基准语言上的有效性。基准数据和代码已经在https://x-factr.githubio上公布。基准数据和代码数据和代码数据和代码。