We present substructure distribution projection (SubDP), a technique that projects a distribution over structures in one domain to another, by projecting substructure distributions separately. Models for the target domains can be then trained, using the projected distributions as soft silver labels. We evaluate SubDP on zero-shot cross-lingual dependency parsing, taking dependency arcs as substructures: we project the predicted dependency arc distributions in the source language(s) to target language(s), and train a target language parser to fit the resulting distributions. When an English treebank is the only annotation that involves human effort, SubDP achieves better unlabeled attachment score than all prior work on the Universal Dependencies v2.2 (Nivre et al., 2020) test set across eight diverse target languages, as well as the best labeled attachment score on six out of eight languages. In addition, SubDP improves zero-shot cross-lingual dependency parsing with very few (e.g., 50) supervised bitext pairs, across a broader range of target languages.
翻译:我们提出亚结构分布预测(SubDP),这是通过单独预测子结构分布而预测一个领域结构向另一个领域的分布的技术。然后可以使用软银标签来培训目标领域的模型。我们用预测的分布作为软银标签来评估子DP,根据零点跨语言的跨语言依赖性分析,将依赖性弧作为子结构:我们用源语言预测预测依赖性弧分布到目标语言,并培训目标语言分析员以适应由此产生的分布。当英国树库是唯一涉及人类努力的说明时,子DP比以前关于通用依赖性 v2.2(Nivre等人,2020年)的所有工作都得到更好的无标签附件评分,这八个不同目标语言的测试以及八种语言中六种的最佳标签附加评分。此外,子DP还用极少数(例如,50种)受监督的比特另一对目标语言改进零点跨语言的跨语言区分。