改善文段检索：零样本问答生成方法 (Improving Passage Retrieval with Zero-Shot Question Generation)

We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or keyword-based), does not require any domain- or task-specific training (and therefore is expected to generalize better to data distribution shifts), and provides rich cross-attention between query and passage (i.e. it must explain every token in the question). When evaluated on a number of open-domain retrieval datasets, our re-ranker improves strong unsupervised retrieval models by 6%-18% absolute and strong supervised models by up to 12% in terms of top-20 passage retrieval accuracy. We also obtain new state-of-the-art results on full open-domain question answering by simply adding the new re-ranker to existing models with no further changes.

翻译：我们提出了一种简单有效的重新排序方法，以改进开放性问答中的文段检索。该方法使用一种预训练的语言模型计算基于检索到的文段的输入问题的概率，并对检索到的文段进行重新评分。这种方法可以应用于任何检索方法（例如，基于神经网络或关键字的检索），不需要任何特定域或任务的训练（因此预计对数据分布的改变具有更好的泛化性），并提供丰富的查询和文段交叉注意力（即必须解释问题中的每个标记）。在对多个开放领域检索数据集进行评估时，我们的重新评分器将无监督检索模型的强劲性能提高了6%-18%，将强监督检索模型的顶级20个文段检索精度提高了最多12%。我们还通过将新的重新评分器添加到现有模型中而没有进一步的改变，实现了全面开放领域问答的新的最先进结果。