We introduce a collection of recognizing textual entailment (RTE) datasets focused on figurative language. We leverage five existing datasets annotated for a variety of figurative language -- simile, metaphor, and irony -- and frame them into over 12,500 RTE examples.We evaluate how well state-of-the-art models trained on popular RTE datasets capture different aspects of figurative language. Our results and analyses indicate that these models might not sufficiently capture figurative language, struggling to perform pragmatic inference and reasoning about world knowledge. Ultimately, our datasets provide a challenging testbed for evaluating RTE models.
翻译:我们引入了一套以比喻语言为重点的识别文本要求数据集(RTE)集。我们利用五套现有的附加注释的数据集,用于各种比喻语言 -- -- 硅语、隐喻语和讽刺语 -- -- 并将其设置为12 500多个RTE实例。我们评估了在流行的RTE数据集方面训练有素的最新模型如何很好地捕捉了比喻语言的不同方面。我们的结果和分析表明,这些模型可能无法充分捕捉比喻语言,难以对世界知识进行务实的推论和推理。最终,我们的数据集为评估RTE模型提供了具有挑战性的测试台。