There is a plethora of data structures, algorithms, and frameworks dealing with major data-stream problems like estimating the frequency of items, answering set membership, association and multiplicity queries, and several other statistics that can be extracted from voluminous data streams. In this survey, we are focusing on exploring randomized data structures called Bloom Filters. This data structure answers whether an item exists or not in a data stream with a false positive probability fpp. In this survey, many variants of the Bloom filter will be covered by showing the strengths of each structure and its drawbacks i.e. some Bloom filters deal with insertion and deletions and others don't, some variants use the memory efficiently but increase the fpp where others pay the trade-off in the reversed way. Furthermore, in each Bloom filter structure, the false positive probability will be highlighted alongside the most important technical details showing the improvement it is presenting, while the main aim of this work is to provide an overall comparison between the variants of the Bloom filter structure according to the application domain that it fits in.
翻译:处理主要数据流问题的数据结构、算法和框架繁多,如估计项目的频率、回答设定的成员、关联和多重查询,以及可以从大量数据流中提取的其他几个统计数据。在本次调查中,我们的重点是探索随机化的数据结构,称为Bloom过滤器。这个数据结构用假正概率折叠来回答数据流中是否有一个项目存在。在本次调查中,Bloom过滤器的许多变种将通过显示每个结构的优点及其缺点来覆盖,即一些Bloom过滤器处理插入和删除,而另一些则不,有些变种有效地使用存储器,但增加了其他变种以逆向方式支付交易的折叠式。此外,在每个Bloom过滤器结构中,将突出虚假的积极概率,同时突出显示其正在显示的改进情况的最重要的技术细节,而这项工作的主要目的是根据所适合的应用领域对Bloom过滤器结构的变种进行总体比较。