使用极端顺序统计的同源数据进行子线性最大内装产品搜索 (Sublinear Maximum Inner Product Search using Concomitants of Extreme Order Statistics)

from arxiv, A short version with a new title "Simple Yet Efficient Algorithms for Maximum Inner Product Search via Extreme Order Statistics" appears in KDD 2021

We propose a novel dimensionality reduction method for maximum inner product search (MIPS), named CEOs, based on the theory of concomitants of extreme order statistics. Utilizing the asymptotic behavior of these concomitants, we show that a few dimensions associated with the extreme values of the query signature are enough to estimate inner products. Since CEOs only uses the sign of a small subset of the query signature for estimation, we can precompute all inner product estimators accurately before querying. These properties yield a sublinear MIPS algorithm with an exponential indexing space complexity. We show that our exponential space is optimal for the $(1 + \epsilon)$-approximate MIPS on a unit sphere. The search recall of CEOs can be theoretically guaranteed under a mild condition. To deal with the exponential space complexity, we propose two practical variants, including sCEOs-TA and coCEOs, that use linear space for solving MIPS. sCEOs-TA exploits the threshold algorithm (TA) and provides superior search recalls to competitive MIPS solvers. coCEOs is a data and dimension co-reduction technique and outperforms sCEOs-TA on high recall requirements. Empirically, they are very simple to implement and achieve at least 100x speedup compared to the bruteforce search while returning top-10 MIPS with accuracy at least 90% on many large-scale data sets.

翻译：我们根据极端秩序统计的相伴关系理论,为最大内部产品搜索提出了新的维度减少方法(MIPS),并命名为CEO。利用这些相伴的极端秩序统计,我们表明,与查询签名的极端值相关的几个维度足以估计内部产品。由于CEO只使用查询签名的一小部分符号来估计内部产品。由于CEO在查询之前只能精确地计算所有内部产品估计器。这些属性产生一个具有指数指数化空间复杂性的亚线性MIPS算法。我们表明,我们的指数空间对单位域的美元(1+\epsilon)接近的MIPS最合适。在理论上可以保证与查询质极低的条件下对首席执行官的检索。为了处理指数性空间复杂性,我们建议了两种实用的变体,包括CEVO-TA和共同执行官,使用线性空间来解决MIPS。S的临界值算法(TA)和向具有竞争力的MIPS解算器提供更高级检索。 COPEO是数据和尺寸最少的数据和尺寸在高水平搜索组中,同时进行最低级的回收,同时进行最大幅度的递增缩的搜索。在100级搜索,同时进行最大幅度的BEMODOFS-COFS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-