Scaling Approximate Nearest Neighbor Search (ANNS) to billions of vectors requires distributed indexes that balance accuracy, latency, and throughput. Yet existing index designs struggle with this tradeoff. This paper presents SPIRE, a scalable vector index based on two design decisions. First, it identifies a balanced partition granularity that avoids read-cost explosion. Second, it introduces an accuracy-preserving recursive construction that builds a multi-level index with predictable search cost and stable accuracy. In experiments with up to 8 billion vectors across 46 nodes, SPIRE achieves high scalability and up to 9.64X higher throughput than state-of-the-art systems.
翻译:将近似最近邻搜索扩展到数十亿向量规模需要分布式索引在精度、延迟和吞吐量之间取得平衡。然而现有索引设计难以兼顾这些指标。本文提出SPIRE——一种基于两项设计决策的可扩展向量索引。首先,该方法确定了可避免读取成本激增的均衡分区粒度。其次,引入精度保持的递归构建机制,可构建具有可预测搜索成本与稳定精度的多级索引。在46个节点上对高达80亿向量的实验表明,SPIRE实现了高可扩展性,其吞吐量较现有最优系统最高提升9.64倍。