Hierarchical Navigable Small World (HNSW) is widely adopted for approximate nearest neighbor search (ANNS) for its ability to deliver high recall with low latency on large-scale, high-dimensional embeddings. The exploration factor, commonly referred to as ef, is a key parameter in HNSW-based vector search that balances accuracy and efficiency. However, existing systems typically rely on manually and statically configured ef values that are uniformly applied across all queries. This results in a distribution-agnostic configuration that fails to account for the non-uniform and skewed nature of real-world embedding data and query workloads. As a consequence, HNSW-based systems suffer from two key practical issues: (i) the absence of recall guarantees, and (ii) inefficient ANNS performance due to over- or under-searching. In this paper, we propose Adaptive-ef (Ada-ef), a data-driven, update-friendly, query-adaptive approach that dynamically configures ef for each query at runtime to approximately meet a declarative target recall with minimal computation. The core of our approach is a theoretically grounded statistical model that captures the similarity distribution between each query and the database vectors. Based on this foundation, we design a query scoring mechanism that distinguishes between queries requiring only small ef and those that need larger ef to meet a target recall, and accordingly assigns an appropriate ef to each query. Experimental results on real-world embeddings produced by state-of-the-art Transformer models from OpenAI and Cohere show that, compared with state-of-the-art learning-based adaptive approaches, our method achieves the target recall while avoiding both over- and under-searching, reducing online query latency by up to 4x, offline computation time by 50x, and offline memory usage by 100x.
翻译:分层可导航小世界(HNSW)因其能够在大规模高维嵌入向量上实现高召回率与低延迟,被广泛应用于近似最近邻搜索(ANNS)。探索因子(通常称为ef)是基于HNSW的向量搜索中平衡精度与效率的关键参数。然而,现有系统通常依赖手动静态配置的ef值,并统一应用于所有查询。这种配置方式忽略了数据分布特性,未能考虑现实世界嵌入数据与查询负载的非均匀性与偏斜性,导致基于HNSW的系统面临两个关键实践问题:(i)缺乏召回率保证;(ii)因搜索过度或不足而导致的ANNS性能低下。本文提出自适应ef(Ada-ef),一种数据驱动、支持动态更新、查询自适应的方案,能够在运行时为每个查询动态配置ef,以在最小计算开销下近似满足声明式目标召回率。该方法的核心是一个基于理论推导的统计模型,用于刻画每个查询与数据库向量之间的相似度分布。在此基础上,我们设计了一种查询评分机制,能够区分仅需较小ef即可满足目标召回率的查询与需要较大ef的查询,并据此为每个查询分配合适的ef。在由OpenAI和Cohere的前沿Transformer模型生成的真实世界嵌入数据上的实验结果表明,与当前最先进的基于学习的自适应方法相比,我们的方法在实现目标召回率的同时避免了搜索过度与不足,将在线查询延迟降低最高达4倍,离线计算时间减少50倍,离线内存占用降低100倍。