Question answering over knowledge graphs and other RDF data has been greatly advanced, with a number of good systems providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, systems from the IR and NLP communities have addressed QA over text, but such systems barely utilize semantic data and knowledge. This paper presents the first system for complex questions that can seamlessly operate over a mixture of RDF datasets and text corpora, or individual sources, in a unified framework. Our method, called UNIQORN, builds a context graph on-the-fly, by retrieving question-relevant evidences from the RDF data and/or a text corpus, using fine-tuned BERT models. The resulting graph is typically rich but highly noisy. UNIQORN copes with this input by a graph algorithm for Group Steiner Trees, that identifies the best answer candidates in the context graph. Experimental results on several benchmarks of complex questions with multiple entities and relations, show that \uniqorn significantly outperforms state-of-the-art methods for QA over heterogeneous sources. The graph-based methodology provides user-interpretable evidence for the complete answering process.
翻译:对知识图表和其他RDF数据的回答问题大有进展,许多良好的系统为自然语言问题或电报查询提供了精确的答案。有些系统将文本源作为回答过程的补充证据,但不能单独计算文本中存在的答案。相反,IR和NLP社区的系统对QA的文本处理过,但这类系统很少使用语义数据和知识。本文件介绍了第一个复杂问题系统,可以在一个统一的框架内对RDF数据集和文本公司或个人来源的混合进行无缝操作。我们的方法称为UNQORN,通过检索RDF数据和(或)文本资料库中的与问题有关的证据,使用经微调的BERT模型进行检索。由此产生的图表通常很丰富,但非常吵闹。UNIQORN用一个基于Group Steiners的图表算法处理这种输入,该图表确定了最佳的回答对象。对多个实体和用户关系中若干复杂问题的基准进行实验,通过检索RDFDF的数据和/或文字资料库,展示了州-州-州-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-方-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国-国</s>