Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.
翻译:许多研究努力都致力于对自然语言理解至关重要的语义作用标签(SRL),当资源丰富的语言(如英语)可以使用大型公司时,受监督的方法取得了令人印象深刻的成绩。对于没有附加注释的SRL数据集的低资源语言来说,获得竞争性的成绩仍是一项挑战。跨语言的SRL是解决这一问题的一个很有希望的方法,在模型传输和注释预测的帮助下,这个问题取得了巨大进展。在本文中,我们提出了一个基于文体翻译的新备选方案,从源金标准SRL说明中为目标语言建立高质量的培训数据集。通用Proposition银行的实验结果表明,基于翻译的方法非常有效,自动假数据集可以显著改进目标语言SRL的性能。