The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most effective methods for building functional NLP systems for low-resource languages. However, for extremely low-resource languages without large-scale monolingual corpora for pre-training or sufficient annotated data for fine-tuning, transfer learning remains an under-studied and challenging task. Moreover, recent work shows that multilingual representations are surprisingly disjoint across languages, bringing additional challenges for transfer onto extremely low-resource languages. In this paper, we propose MetaXL, a meta-learning based framework that learns to transform representations judiciously from auxiliary languages to a target one and brings their representation spaces closer for effective transfer. Extensive experiments on real-world low-resource languages - without access to large-scale monolingual corpora or large amounts of labeled data - for tasks like cross-lingual sentiment analysis and named entity recognition show the effectiveness of our approach. Code for MetaXL is publicly available at github.com/microsoft/MetaXL.
翻译:此外,最近的工作表明,多种语文的表述方式在各种语文之间令人惊讶地脱节,给向极低资源语言的转移带来了额外的挑战。在本文件中,我们提出MetaXL,这是一个基于元学习的框架,它学会明智地将各种表述方式从辅助语言转变为目标语言,并使其代表空间更接近于有效转移。关于实际世界低资源语言的广泛实验,没有获得大规模单一语言的表达方式或大量标签数据,用于跨语言情绪分析和名称实体确认等任务,显示了我们的方法的有效性。MetaXL代码可在 Github.com/microsoft/MetaXL上公开查阅。