Query-based open-domain NLP tasks require information synthesis from long and diverse web results. Current approaches extractively select portions of web text as input to Sequence-to-Sequence models using methods such as TF-IDF ranking. We propose constructing a local graph structured knowledge base for each query, which compresses the web search information and reduces redundancy. We show that by linearizing the graph into a structured input sequence, models can encode the graph representations within a standard Sequence-to-Sequence setting. For two generative tasks with very long text input, long-form question answering and multi-document summarization, feeding graph representations as input can achieve better performance than using retrieved text portions.
翻译:基于查询的开放域域 NLP 任务需要从长而多样的网络结果中进行信息综合。 目前的方法是利用TF- IDF 排序等方法,将网络文本的一部分抽取作为序列到序列模型的投入。 我们建议为每个查询建立一个本地图形结构化知识库, 以压缩网络搜索信息并减少冗余。 我们显示, 通过将图形线性化成结构化输入序列, 模型可以在标准序列到序列设置中将图形表达形式编码。 对于两个具有长文本输入、长式问题回答和多文件汇总的基因化任务, 将图形表述作为投入可以比检索到的文本部分取得更好的业绩。