通过高效平均扭曲平均嵌入器搜索近邻 (Near Neighbor Search via Efficient Average Distortion Embeddings)

A recent series of papers by Andoni, Naor, Nikolov, Razenshteyn, and Waingarten (STOC 2018, FOCS 2018) has given approximate near neighbour search (NNS) data structures for a wide class of distance metrics, including all norms. In particular, these data structures achieve approximation on the order of $p$ for $\ell_p^d$ norms with space complexity nearly linear in the dataset size $n$ and polynomial in the dimension $d$, and query time sub-linear in $n$ and polynomial in $d$. The main shortcoming is the exponential in $d$ pre-processing time required for their construction. In this paper, we describe a more direct framework for constructing NNS data structures for general norms. More specifically, we show via an algorithmic reduction that an efficient NNS data structure for a given metric is implied by an efficient average distortion embedding of it into $\ell_1$ or into Euclidean space. In particular, the resulting data structures require only polynomial pre-processing time, as long as the embedding can be computed in polynomial time. As a concrete instantiation of this framework, we give an NNS data structure for $\ell_p$ with efficient pre-processing that matches the approximation factor, space and query complexity of the aforementioned data structure of Andoni et al. On the way, we resolve a question of Naor (Analysis and Geometry in Metric Spaces, 2014) and provide an explicit, efficiently computable embedding of $\ell_p$, for $p \ge 2$, into $\ell_2$ with (quadratic) average distortion on the order of $p$. We expect our approach to pave the way for constructing efficient NNS data structures for all norms.

翻译：由Andoni、Naor、Nikolov、Razenshteyn和Waingarten(STOC 2018、FOCS 2018)撰写的最近一系列论文提供了近邻搜索(NNS)数据结构,包括所有规范。特别是,这些数据结构在以美元计价的数据集大小接近线性地接近美元,空间复杂度为美元,空间复杂度接近线性值,以美元计价,以美元计价分线查询时间分流,以美元计价,以美元计价计算。主要缺陷是其构建所需的以美元计价的预处理时间。在本文中,我们描述了为一般规范构建 NNS 数据结构的更直接框架。更具体地说,我们通过算法的缩减,将NNS 数据结构有效平均地嵌入美元=1美元,或以美元计价价制价, 特别是,由此形成的数据结构只需要以美元计价前的多货币值时间, 并且以美元计价制价制价为我们的数据结构。