Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but do not allow parallelism for creating sketches using multiple threads or querying them while they are being built. We present a generic approach to parallelising data sketches efficiently, while bounding the error that such parallelism introduces. Utilising relaxed semantics and the notion of strong linearisability we prove our algorithm's correctness and analyse the error it induces in two specific sketches. Our implementation achieves high scalability while keeping the error small.
翻译:数据草图是长流的大致简洁摘要。 它们被广泛用于处理大量数据并实时回答关于这些数据的统计问题。 现有的图书馆制作草图非常快, 但不允许使用多条线条制作草图, 或者在建建中时用多条线条来查询这些草图。 我们提出了一个通用的方法来有效地平行数据草图, 同时将这种平行主义带来的错误捆绑起来。 利用宽松的语义和强线性概念, 我们证明了算法的正确性, 并在两个具体的草图中分析了它引起的错误。 我们的落实在保持小错误的同时, 实现了高度的可缩放性 。