流散散矢量的最大内部产品搜索近似算法 (An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors) - 专知论文

会员服务 ·

0

向量化 · 近似 · 稀疏 · 内积 · 流 ·

2023 年 1 月 25 日

An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors

翻译：流散散矢量的最大内部产品搜索近似算法

Sebastian Bruch,Franco Maria Nardini,Amir Ingber,Edo Liberty

Maximum Inner Product Search or top-k retrieval on sparse vectors is well-understood in information retrieval, with a number of mature algorithms that solve it exactly. However, all existing algorithms are tailored to text and frequency-based similarity measures. To achieve optimal memory footprint and query latency, they rely on the near stationarity of documents and on laws governing natural languages. We consider, instead, a setup in which collections are streaming -- necessitating dynamic indexing -- and where indexing and retrieval must work with arbitrarily distributed real-valued vectors. As we show, existing algorithms are no longer competitive in this setup, even against naive solutions. We investigate this gap and present a novel approximate solution, called Sinnamon, that can efficiently retrieve the top-k results for sparse real valued vectors drawn from arbitrary distributions. Notably, Sinnamon offers levers to trade-off memory consumption, latency, and accuracy, making the algorithm suitable for constrained applications and systems. We give theoretical results on the error introduced by the approximate nature of the algorithm, and present an empirical evaluation of its performance on two hardware platforms and synthetic and real-valued datasets. We conclude by laying out concrete directions for future research on this general top-k retrieval problem over sparse vectors.

翻译：对稀有矢量的最大产品搜索或顶点检索在信息检索中被完全理解,信息检索中有一定数量的成熟算法可以完全解决这个问题。然而,所有现有的算法都是根据文本和基于频率的类似措施定制的。为了实现最佳的记忆足迹和查询延缓度,它们依赖于文件的接近静止性和自然语言管理法。相反,我们考虑的是集集的设置 -- -- 需要动态索引化 -- -- 以及索引和检索必须同任意分布的真实价值矢量起作用。正如我们所显示的那样,现有的算法在这个设置中不再具有竞争力,甚至与天真的解决方案相对。我们调查了这一差距,并提出了一个新的近似解决办法,称为Sinnanon,它能够有效地检索从任意分布中提取的稀有真正价值矢量的矢量的最大结果。特别是Sinnanon提供了交换记忆消耗、纬度和精确度的杠杆,使算法适合于受限制的应用程序和系统。我们从逻辑的大致性质中得出了错误的理论结果,并展示了它在两个硬件平台上的表现以及合成和不断变现的矢量的矢量数据,我们通过对未来进行一般性的具体研究而得出了这一结果。

0

相关内容

向量化

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

67+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

瘦素调节2型糖尿病大鼠交感神经活性及压力反射敏感性的机制

国家自然科学基金

0+阅读 · 2015年12月31日

蛋白磷酸酶2A在NO供体诱导肝癌细胞凋亡中的调节作用

国家自然科学基金

0+阅读 · 2015年12月31日

间充质干细胞的自噬在其治疗脊髓损伤中的作用和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

水莱茵海默氏菌 (Rheinheimera aquimaris) 淬灭细菌群体感应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于蛋白质组学和代谢组学整合分析的Paraconiothyrium variable GHJ-4降解木质素的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

白藜芦醇调节STIM1抑制血管平滑肌细胞增殖机制的探讨

国家自然科学基金

0+阅读 · 2012年12月31日

磁暴期间中低纬度地区电离层扰动特性的层析成像研究

国家自然科学基金

0+阅读 · 2011年12月31日

核因子κ#22312;甲状腺癌核素治疗耐药机制和治疗中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

高拱坝局部损伤监测与多尺度安全评价模型研究

国家自然科学基金

0+阅读 · 2008年12月31日

Classifying Mental-Disorders through Clinicians Subjective Approach based on Three-way Decision

Arxiv

0+阅读 · 2023年3月16日

Evaluation of distance-based approaches for forensic comparison: Application to hand odor evidence

Arxiv

0+阅读 · 2023年3月16日

SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

Arxiv

0+阅读 · 2023年3月16日

An Approximate Bayesian Approach to Covariate-dependent Graphical Modeling

Arxiv

0+阅读 · 2023年3月15日

A Distributed Machine Learning-Based Approach for IRS-Enhanced Cell-Free MIMO Networks

Arxiv

0+阅读 · 2023年3月15日

On the number of subproblem iterations per coupling step in partitioned fluid-structure interaction simulations

Arxiv

0+阅读 · 2023年3月15日

Quantum Steering Algorithm for Estimating Fidelity of Separability

Arxiv

0+阅读 · 2023年3月14日

Best arm identification in rare events

Arxiv

0+阅读 · 2023年3月14日

General Loss Functions Lead to (Approximate) Interpolation in High Dimensions

Arxiv

0+阅读 · 2023年3月13日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

71+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

67+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

50+阅读 · 2020年12月14日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

12+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

31+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

168+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

热门VIP内容

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

14+阅读 · 2019年4月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

相关论文

Classifying Mental-Disorders through Clinicians Subjective Approach based on Three-way Decision

Arxiv

0+阅读 · 2023年3月16日

Evaluation of distance-based approaches for forensic comparison: Application to hand odor evidence

Arxiv

0+阅读 · 2023年3月16日

SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning

Arxiv

0+阅读 · 2023年3月16日

An Approximate Bayesian Approach to Covariate-dependent Graphical Modeling

Arxiv

0+阅读 · 2023年3月15日

A Distributed Machine Learning-Based Approach for IRS-Enhanced Cell-Free MIMO Networks

Arxiv

0+阅读 · 2023年3月15日

On the number of subproblem iterations per coupling step in partitioned fluid-structure interaction simulations

Arxiv

0+阅读 · 2023年3月15日

Quantum Steering Algorithm for Estimating Fidelity of Separability

Arxiv

0+阅读 · 2023年3月14日

Best arm identification in rare events

Arxiv

0+阅读 · 2023年3月14日

General Loss Functions Lead to (Approximate) Interpolation in High Dimensions

Arxiv

0+阅读 · 2023年3月13日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

相关基金

瘦素调节2型糖尿病大鼠交感神经活性及压力反射敏感性的机制

国家自然科学基金

0+阅读 · 2015年12月31日

蛋白磷酸酶2A在NO供体诱导肝癌细胞凋亡中的调节作用

国家自然科学基金

0+阅读 · 2015年12月31日

间充质干细胞的自噬在其治疗脊髓损伤中的作用和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

水莱茵海默氏菌 (Rheinheimera aquimaris) 淬灭细菌群体感应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于蛋白质组学和代谢组学整合分析的Paraconiothyrium variable GHJ-4降解木质素的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Arisandilactone A 的不对称全合成

国家自然科学基金

0+阅读 · 2012年12月31日

白藜芦醇调节STIM1抑制血管平滑肌细胞增殖机制的探讨

国家自然科学基金

0+阅读 · 2012年12月31日

磁暴期间中低纬度地区电离层扰动特性的层析成像研究

国家自然科学基金

0+阅读 · 2011年12月31日

核因子κ#22312;甲状腺癌核素治疗耐药机制和治疗中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

高拱坝局部损伤监测与多尺度安全评价模型研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员