Pretrained contextual representation models (Peters et al., 2018; Devlin et al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new release of BERT (Devlin, 2018) includes a model simultaneously pretrained on 104 languages with impressive performance for zero-shot cross-lingual transfer on a natural language inference task. This paper explores the broader cross-lingual potential of mBERT (multilingual) as a zero shot language transfer model on 5 NLP tasks covering a total of 39 languages from various language families: NLI, document classification, NER, POS tagging, and dependency parsing. We compare mBERT with the best-published methods for zero-shot cross-lingual transfer and find mBERT competitive on each task. Additionally, we investigate the most effective strategy for utilizing mBERT in this manner, determine to what extent mBERT generalizes away from language specific features, and measure factors that influence cross-lingual transfer.
翻译:受培训背景代表模式(彼得等人,2018年;德夫林等人,2018年)推动了关于许多非语言语言任务的最新技术。新发行的BERT(德夫林,2018年)包括了一种同时培训104种语言的模式,在自然语言推导任务上零点跨语言转让表现显著的104种语言上,这一模式在自然语言推导任务上都具有令人印象深刻的跨语言转让性能。本文探讨了MBERT(多语言)作为5种非语言任务零点数语言传输模式的更广泛的跨语言潜力,涵盖来自不同语言家庭的共39种语言:NLI、文件分类、NER、POS标记和依赖性划分。我们将MBERT与最佳的零点位跨语言转移方法进行比较,并在每项任务中找到MBERT的竞争力。此外,我们调查以这种方式使用 mBERT的最有效战略,确定MERT在多大程度上普遍排除语言特定特征,并衡量影响跨语言转让的因素。