Learned Monotone Minimal Perfect Hashing (Learned Monotone Minimal Perfect Hashing) - 专知论文

会员服务 ·

0

哈希学习 · 映射 · 哈希 · 构建 · Learning ·

2023 年 4 月 21 日

Learned Monotone Minimal Perfect Hashing

翻译：Learned Monotone Minimal Perfect Hashing

Paolo Ferragina,Hans-Peter Lehmann,Peter Sanders,Giorgio Vinciguerra

A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary value. Applications range from databases, search engines, data encryption, to pattern-matching algorithms. In this paper, we describe LeMonHash, a new technique for constructing MMPHFs for integers. The core idea of LeMonHash is surprisingly simple and effective: we learn a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then we solve the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR). On synthetic random datasets, LeMonHash needs 35% less space than the next best competitor, while achieving about 16 times faster queries. On real-world datasets, the space usage is very close to or much better than the best competitors, while achieving up to 19 times faster queries than the next larger competitor. As far as the construction of LeMonHash is concerned, we get an improvement by a factor of up to 2, compared to the competitor with the next best space usage. We also investigate the case of keys being variable-length strings, introducing the so-called LeMonHash-VL: it needs space within 10% of the best competitors while achieving up to 3 times faster queries.

翻译：学习的单调最小完美哈希一个在键值集合S上构建的单调最小完美哈希函数（MMPHF）是一个将每个键映射到其排名的函数。对于S中不存在的键，函数返回任意值。应用范围包括数据库、搜索引擎、数据加密和模式匹配算法。在本文中，我们介绍了一种新技术LeMonHash，用于构建整数的MMPH。LeMonHash的核心思想非常简单而有效：我们通过一个误差有界的分段线性模型（PGM索引）学习了一个从键到它们的排名的单调映射，然后我们在检索数据结构（BuRR）中将映射到同一排名的键赋予小整数，以解决可能产生的冲突。在合成随机数据集上，LeMonHash所需的空间比下一个最佳竞争者少35%，同时查询速度快约16倍。在真实数据集上，空间使用与最佳竞争者非常接近或更好，同时查询速度比下一个更大的竞争者快达19倍。就LeMonHash的构建而言，在空间使用方面，我们获得了比下一个最佳空间使用竞争者高达2倍的改进。我们还研究了键值为可变长度字符串的情况，引入了所谓的LeMonHash-VL：它所需空间不到最佳竞争者的10%，同时查询速度快达3倍。

0

相关内容

哈希学习

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

【2021新书】编码艺术，Coding Art，284页pdf

【2021新书】编码艺术，Coding Art，284页pdf

专知会员服务

77+阅读 · 2021年1月10日

【CVPR2020-斯坦福】从RGB-D扫描对抗纹理优化，Adversarial Texture Optimization

【CVPR2020-斯坦福】从RGB-D扫描对抗纹理优化，Adversarial Texture Optimization

专知会员服务

17+阅读 · 2020年3月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

92+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

“CVPR 2020 接受论文列表 1470篇论文都在这了

“CVPR 2020 接受论文列表 1470篇论文都在这了

专知

71+阅读 · 2020年6月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

14+阅读 · 2017年9月24日

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

云计算下的加密域多媒体水印与模式匹配

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

一种含有复杂外形运动物体的高效IB-LBM算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于 HSS 迭代方法的加性 Schwarz 算法

国家自然科学基金

0+阅读 · 2013年12月31日

金属微腔效应提高聚合物并联太阳能电池效率的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

无人机主动视觉运动目标自主导引关键技术基础研究

国家自然科学基金

2+阅读 · 2009年12月31日

Extension of the Blackboard Architecture with Common Properties and Generic Rules

Arxiv

0+阅读 · 2023年6月7日

Resource Aware Clustering for Tackling the Heterogeneity of Participants in Federated Learning

Arxiv

0+阅读 · 2023年6月7日

DL-DRL: A double-level deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Arxiv

0+阅读 · 2023年6月6日

Hyper-distance Oracles in Hypergraphs

Arxiv

0+阅读 · 2023年6月5日

A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles

Arxiv

0+阅读 · 2023年6月5日

Fair Labeled Clustering

Arxiv

0+阅读 · 2023年6月4日

A Decentralized Alternating Gradient Method for Communication-Efficient Bilevel Programming

Arxiv

0+阅读 · 2023年6月2日

Complexity of Motion Planning of Arbitrarily Many Robots: Gadgets, Petri Nets, and Counter Machines

Arxiv

0+阅读 · 2023年6月1日

Accountable authentication with privacy protection: The Larch system for universal login

Arxiv

0+阅读 · 2023年6月1日

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

Arxiv

18+阅读 · 2021年12月21日

VIP会员

文章信息

相关主题

相关VIP内容

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【Facebook-Ishan Mishra】计算机视觉自监督学习，92页ppt

专知会员服务

36+阅读 · 2021年7月7日

【2021新书】编码艺术，Coding Art，284页pdf

【2021新书】编码艺术，Coding Art，284页pdf

专知会员服务

77+阅读 · 2021年1月10日

【CVPR2020-斯坦福】从RGB-D扫描对抗纹理优化，Adversarial Texture Optimization

【CVPR2020-斯坦福】从RGB-D扫描对抗纹理优化，Adversarial Texture Optimization

专知会员服务

17+阅读 · 2020年3月21日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

92+阅读 · 2020年2月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CVPR2025】CrayonRobo：面向机器人操作的以对象为中心的提示驱动视觉-语言-动作模型

2025年中国AI for Science行业概览：创新驱动：AI如何助力科学创新的无限可能

【NTU博士论文】当深度学习遇上归纳逻辑程序设计

【ICML2025】通过概念对齐与混淆感知校准边界处理视觉-语言模型中的伪标签不平衡问题

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

“CVPR 2020 接受论文列表 1470篇论文都在这了

“CVPR 2020 接受论文列表 1470篇论文都在这了

专知

71+阅读 · 2020年6月10日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

14+阅读 · 2017年9月24日

相关论文

Extension of the Blackboard Architecture with Common Properties and Generic Rules

Arxiv

0+阅读 · 2023年6月7日

Resource Aware Clustering for Tackling the Heterogeneity of Participants in Federated Learning

Arxiv

0+阅读 · 2023年6月7日

DL-DRL: A double-level deep reinforcement learning approach for large-scale task scheduling of multi-UAV

Arxiv

0+阅读 · 2023年6月6日

Hyper-distance Oracles in Hypergraphs

Arxiv

0+阅读 · 2023年6月5日

A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles

Arxiv

0+阅读 · 2023年6月5日

Fair Labeled Clustering

Arxiv

0+阅读 · 2023年6月4日

A Decentralized Alternating Gradient Method for Communication-Efficient Bilevel Programming

Arxiv

0+阅读 · 2023年6月2日

Complexity of Motion Planning of Arbitrarily Many Robots: Gadgets, Petri Nets, and Counter Machines

Arxiv

0+阅读 · 2023年6月1日

Accountable authentication with privacy protection: The Larch system for universal login

Arxiv

0+阅读 · 2023年6月1日

Explainable Artificial Intelligence for Autonomous Driving: A Comprehensive Overview and Field Guide for Future Research Directions

Arxiv

18+阅读 · 2021年12月21日

相关基金

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

云计算下的加密域多媒体水印与模式匹配

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

一种含有复杂外形运动物体的高效IB-LBM算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于 HSS 迭代方法的加性 Schwarz 算法

国家自然科学基金

0+阅读 · 2013年12月31日

金属微腔效应提高聚合物并联太阳能电池效率的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

线性积分方程的Galerkin快速谱方法

国家自然科学基金

0+阅读 · 2009年12月31日

无人机主动视觉运动目标自主导引关键技术基础研究

国家自然科学基金

2+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员