Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation - 专知论文

会员服务 ·

0

内积 · 线性的 · 估计/估计量 · Weight · 稀疏 ·

2023 年 5 月 5 日

Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation

翻译：暂无翻译

Aline Bessa,Majid Daliri,Juliana Freire,Cameron Musco,Christopher Musco,Aécio Santos,Haoxiang Zhang

from arxiv, 23 pages, 6 figures

We present a new approach for computing compact sketches that can be used to approximate the inner product between pairs of high-dimensional vectors. Based on the Weighted MinHash algorithm, our approach admits strong accuracy guarantees that improve on the guarantees of popular linear sketching approaches for inner product estimation, such as CountSketch and Johnson-Lindenstrauss projection. Specifically, while our method admits guarantees that exactly match linear sketching for dense vectors, it yields significantly lower error for sparse vectors with limited overlap between non-zero entries. Such vectors arise in many applications involving sparse data. They are also important in increasingly popular dataset search applications, where inner product sketches are used to estimate data covariance, conditional means, and other quantities involving columns in unjoined tables. We complement our theoretical results by showing that our approach empirically outperforms existing linear sketches and unweighted hashing-based sketches for sparse vectors.

翻译：暂无翻译

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

厚果崖豆藤中新型微管抑制剂Pachycarpaone的微管抑制机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性偏微分方程的新型扩展混合元法高阶格式研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于物理和几何的相变与凝聚现象

国家自然科学基金

0+阅读 · 2012年12月31日

窄带隙小分子液晶驱动ZnO/P3HT调控杂化本体异质结微观结构及光伏性能

国家自然科学基金

0+阅读 · 2011年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

Arxiv

0+阅读 · 2023年6月21日

Bayesian model-based clustering for populations of network data

Arxiv

0+阅读 · 2023年6月20日

Cuckoo Hashing in Cryptography: Optimal Parameters, Robustness and Applications

Arxiv

0+阅读 · 2023年6月20日

Multigrid preconditioning for regularized least-squares problems

Arxiv

0+阅读 · 2023年6月19日

Overcoming the order barrier two in splitting methods when applied to semilinear parabolic problems with non-periodic boundary conditions

Arxiv

0+阅读 · 2023年6月19日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

因果强化学习的统一框架：综述、分类体系、算法与应用

《无人机系统 - 反无人机系统：测试方法》364页

【MIT博士论文】语言模型的推理时学习算法

美军低成本无人作战攻击系统（LUCAS）：扩大无人机战争规模

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes

Arxiv

0+阅读 · 2023年6月21日

Bayesian model-based clustering for populations of network data

Arxiv

0+阅读 · 2023年6月20日

Cuckoo Hashing in Cryptography: Optimal Parameters, Robustness and Applications

Arxiv

0+阅读 · 2023年6月20日

Multigrid preconditioning for regularized least-squares problems

Arxiv

0+阅读 · 2023年6月19日

Overcoming the order barrier two in splitting methods when applied to semilinear parabolic problems with non-periodic boundary conditions

Arxiv

0+阅读 · 2023年6月19日

相关基金

厚果崖豆藤中新型微管抑制剂Pachycarpaone的微管抑制机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性偏微分方程的新型扩展混合元法高阶格式研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于物理和几何的相变与凝聚现象

国家自然科学基金

0+阅读 · 2012年12月31日

窄带隙小分子液晶驱动ZnO/P3HT调控杂化本体异质结微观结构及光伏性能

国家自然科学基金

0+阅读 · 2011年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员