与 PTHash 平行和外部模拟建造最小完美散列函数 (Parallel and External-Memory Construction of Minimal Perfect Hash Functions with PTHash)

A minimal perfect hash function $f$ for a set $S$ of $n$ keys is a bijective function of the form $f : S \rightarrow \{0,\ldots,n-1\}$. These functions are important for many practical applications in computing, such as search engines, computer networks, and databases. Several algorithms have been proposed to build minimal perfect hash functions that: scale well to large sets, retain fast evaluation time, and take very little space, e.g., 2 - 3 bits/key. PTHash is one such algorithm, achieving very fast evaluation in compressed space, typically several times faster than other techniques. In this work, we propose a new construction algorithm for PTHash enabling: (1) multi-threading, to either build functions more quickly or more space-efficiently, and (2) external-memory processing to scale to inputs much larger than the available internal memory. Only few other algorithms in the literature share these features, despite of their big practical impact. We conduct an extensive experimental assessment on large real-world string collections and show that, with respect to other techniques, PTHash is competitive in construction time and space consumption, but retains 2 - 6$\times$ better lookup time.

翻译：最起码的完美 hash 函数 $1 f 美元 $n 键。 PTHash 是一种这样的算法, 在压缩空间中实现非常快速的评估, 通常比其他技术快几倍。在这项工作中, 我们为 PTHash 启用提出了一个新的建筑算法:(1) 多读, 要么更快地或更高效地建立功能, 要么以空间效率更高的方式建立功能, (2) 外部- 模拟处理, 规模到投入比现有内部记忆大得多。文献中只有极少数其他算法分享这些特征, 尽管这些特征具有巨大的实际影响。我们对大型真实世界字符收藏进行了广泛的实验评估, 并显示, 在其它技术方面, PTHash 保留了6 美元。