AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning. Self-learning techniques have also played a role in pushing performance forward. However, for most recent high performant parsers, the effect of self-learning and silver data augmentation seems to be fading. In this paper we propose to overcome this diminishing returns of silver data by combining Smatch-based ensembling techniques with ensemble distillation. In an extensive experimental setup, we push single model English parser performance to a new state-of-the-art, 85.9 (AMR2.0) and 84.3 (AMR3.0), and return to substantial gains from silver data augmentation. We also attain a new state-of-the-art for cross-lingual AMR parsing for Chinese, German, Italian and Spanish. Finally we explore the impact of the proposed technique on domain adaptation, and show that it can produce gains rivaling those of human annotated data for QALD-9 and achieve a new state-of-the-art for BioAMR.
翻译:在过去三年里,由于包括建筑改进和转让学习在内的各种效应的混合作用,AMR分析的性能出现了前所未有的增长;自学技术在推进业绩方面也发挥了一定的作用;然而,对于最近的高表现的剖析员来说,自学和银化数据扩增的影响似乎正在逐渐消失;在本文件中,我们提议通过将基于批量的拼凑技术与混合蒸馏技术相结合来克服银色数据的不断减少的回报率;在广泛的实验中,我们将单一模型的英语授精者性能推向一个新的最新水平,即85.9(AMR2.0)和84.3(AMR3.0),并从银色数据扩增中恢复到实质性的收益;我们还为中文、德文、意大利文和西班牙文实现了新的跨语言的AMR分级技术。最后,我们探讨了拟议的技术对域适应的影响,并表明它能够产生与QALD-9(QALD-9)和84.3(AMR3.0)的人类附加数据相匹配的收益,并实现新的生物AMRAMR数据的新状态。