Human annotation for syntactic parsing is expensive, and large resources are available only for a fraction of languages. A question we ask is whether one can leverage abundant unlabeled texts to improve syntactic parsers, beyond just using the texts to obtain more generalisable lexical features (i.e. beyond word embeddings). To this end, we propose a novel latent-variable generative model for semi-supervised syntactic dependency parsing. As exact inference is intractable, we introduce a differentiable relaxation to obtain approximate samples and compute gradients with respect to the parser parameters. Our method (Differentiable Perturb-and-Parse) relies on differentiable dynamic programming over stochastically perturbed edge scores. We demonstrate effectiveness of our approach with experiments on English, French and Swedish.
翻译:用于合成分析的人类注解非常昂贵,而且大量资源只能用于一小部分语言。我们问的问题是,除了利用文本来获取更通用的词汇特征(即,除嵌入文字外)之外,我们能否利用大量未贴标签的文本来改进合成分析者。为此,我们为半受监督合成依赖分析提出了一个新的潜在可变基因模型。由于精确的推论是难解的,我们引入了一种不同的放松,以获取近似样本,并计算与剖析参数有关的梯度。我们的方法(可移动的 Perturb-和-Parse)依赖于与相近的边缘分数不同的动态编程。我们用英语、法语和瑞典语的实验展示了我们的方法的有效性。