多任务和多种语文词汇分析模型 (Multitask and Multilingual Modelling for Lexical Analysis)

from arxiv, Thesis summary. This is a pre-print of an article published in KI - K\"unstliche Intelligenz. The final authenticated version is available online at: https://doi.org/10.1007/s13218-018-0557-5

In Natural Language Processing (NLP), one traditionally considers a single task (e.g. part-of-speech tagging) for a single language (e.g. English) at a time. However, recent work has shown that it can be beneficial to take advantage of relatedness between tasks, as well as between languages. In this work I examine the concept of relatedness and explore how it can be utilised to build NLP models that require less manually annotated data. A large selection of NLP tasks is investigated for a substantial language sample comprising 60 languages. The results show potential for joint multitask and multilingual modelling, and hints at linguistic insights which can be gained from such models.

翻译：在自然语言处理(NLP)中,人们传统上认为单一语言(例如英语)的单一任务(例如部分语音标记),但是,最近的工作表明,利用任务之间以及语言之间的联系是有益的。在这项工作中,我研究了关联性的概念,并探讨了如何利用它来建立非人工手动需要附加说明的数据的NLP模式。大量选择的NLP任务对由60种语言组成的大量语言样本进行了调查。结果显示有可能进行多任务和多语言联合建模,并暗示可从这些模型中获得的语言洞察力。

相关内容

词法分析

关注 204

词法分析（英语：lexical analysis）是计算机科学中将字符序列转换为单词（Token）序列的过程。词法分析（lexical analysis）包括汉语分词和词性标注两部分。和大部分西方语言不同，汉语书面语词语之间没有明显的空格标记，文本中的句子以字串的形式出现。因此汉语自然语言处理的首要工作就是要将输入的字串切分为单独的词语，然后在此基础上进行其他更高级的分析，这一步骤称为分词（word segmentation 或tokenization）。除了分词，词性标注也通常认为是词法分析的一部分。给定一个切好词的句子，词性标注的目的是为每一个词赋予一个类别，这个类别称为词性标记（part-of-speech tag），比如，名词（noun）、动词（verb）、形容词（adjective）等。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

52+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日