In plenty of machine learning applications, the most relevant items for a particular query should be efficiently extracted, while the relevance function is based on a highly-nonlinear model, e.g., DNNs or GBDTs. Due to the high computational complexity of such models, exhaustive search is infeasible even for medium-scale problems. To address this issue, we introduce Relevance Proximity Graphs (RPG): an efficient non-exhaustive approach that provides a high-quality approximate solution for maximal relevance retrieval. Namely, we extend the recent similarity graphs framework to the setting, when there is no similarity measure defined on item pairs, which is a common practical use-case. By design, our approach directly maximizes off-the-shelf relevance functions and does not require any proxy auxiliary models. Via extensive experiments, we show that the developed method provides excellent retrieval accuracy while requiring only a few model computations, outperforming indirect models. We open-source our implementation as well as two large-scale datasets to support further research on relevance retrieval.
翻译:在大量机器学习应用程序中,对特定查询最相关的项目应高效地提取,而相关功能则以高度非线性模型为基础,例如DNNs或GBDTs。由于这些模型的计算复杂性很高,即使对于中等规模的问题也不可能进行彻底搜索。为了解决这一问题,我们引入了相关性近似图(RPG):一种有效的非详尽无遗的方法,为最大相关性检索提供了高质量的近似解决方案。也就是说,我们将最近的相似图形框架扩展至环境环境,而对于项目配对没有界定相似的计量,这是一个共同的实用使用案例。通过设计,我们的方法直接使现成相关性功能最大化,而不需要任何替代的辅助模型。通过广泛的实验,我们表明,开发的方法提供了极好的检索准确性,而只需要少数模型的计算,优于非直接模型。我们将我们的执行情况以及两个大尺度的数据集作为来源,以支持对相关性检索的进一步研究。