Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location by returning the inter-key interval containing the query. This is in contrast to other dictionary data structures, like hash tables, that only report a failed search. We address the question "what is the additional cost of determining approximate locations for non-key queries"? We prove that for two-way comparison trees this additional cost is at most 1. Our proof is based on a novel probabilistic argument that involves converting a search tree that does not identify non-key queries into a random tree that does.
翻译:搜索树通常用于对一组存储的密钥进行访问操作。 如果这组密钥是静态的,而且会籍查询的概率是事先知道的, 那么人们就可以预估最佳的搜索树, 也就是将预期访问成本降到最低。 对于非密钥查询, 搜索树可以通过返回含有查询的密钥间隔来确定其大致位置。 这与其他词典数据结构( 如散列表格) 相比, 仅报告搜索失败。 我们处理的问题是“ 确定非密钥查询的大致位置的额外费用是多少? ” 我们证明, 双向比较树的这一额外费用最多是 1 。 我们的证据是基于一个新颖的概率论, 涉及将未识别非密钥查询的搜索树转换成随机树。