In Natural Language Processing (NLP), one traditionally considers a single task (e.g. part-of-speech tagging) for a single language (e.g. English) at a time. However, recent work has shown that it can be beneficial to take advantage of relatedness between tasks, as well as between languages. In this work I examine the concept of relatedness and explore how it can be utilised to build NLP models that require less manually annotated data. A large selection of NLP tasks is investigated for a substantial language sample comprising 60 languages. The results show potential for joint multitask and multilingual modelling, and hints at linguistic insights which can be gained from such models.
翻译:在自然语言处理(NLP)中,人们传统上认为单一语言(例如英语)的单一任务(例如部分语音标记),但是,最近的工作表明,利用任务之间以及语言之间的联系是有益的。在这项工作中,我研究了关联性的概念,并探讨了如何利用它来建立非人工手动需要附加说明的数据的NLP模式。大量选择的NLP任务对由60种语言组成的大量语言样本进行了调查。结果显示有可能进行多任务和多语言联合建模,并暗示可从这些模型中获得的语言洞察力。