接近于使用子线预处理时间进行查询编号 (Towards Approximate Query Enumeration with Sublinear Preprocessing Time)

This paper aims at providing extremely efficient algorithms for approximate query enumeration on sparse databases, that come with performance and accuracy guarantees. We introduce a new model for approximate query enumeration on classes of relational databases of bounded degree. We first prove that on databases of bounded degree any local first-order definable query can be enumerated approximately with constant delay after a constant time preprocessing phase. We extend this, showing that on databases of bounded tree-width and bounded degree, every query that is expressible in first-order logic can be enumerated approximately with constant delay after a sublinear (more precisely, polylogarithmic) time preprocessing phase. Durand and Grandjean (ACM Transactions on Computational Logic 2007) proved that exact enumeration of first-order queries on databases of bounded degree can be done with constant delay after a linear time preprocessing phase. Hence we achieve a significant speed-up in the preprocessing phase. Since sublinear running time does not allow reading the whole input database even once, sacrificing some accuracy is inevitable for our speed-up. Nevertheless, our enumeration algorithms come with guarantees: With high probability, (1) only tuples are enumerated that are answers to the query or `close' to being answers to the query, and (2) if the proportion of tuples that are answers to the query is sufficiently large, then all answers will be enumerated. Here the notion of `closeness' is a tuple edit distance in the input database. For local first-order queries, only actual answers are enumerated, strengthening (1). Moreover, both the `closeness' and the proportion required in (2) are controllable. We combine methods from property testing of bounded degree graphs with logic and query enumeration, which we believe can inspire further research.

翻译：本文旨在为稀有数据库的近似查点提供极为高效的算法,这种算法具有性能和准确性保障。我们引入了一个新的模型,用于对受约束程度关系数据库类别进行近似查点。我们首先证明,在受约束程度数据库中,任何本地一级自定查询都可以在固定的预处理阶段过后不断拖延地进行。我们扩展了这一算法,显示在受约束树宽和受约束程度数据库中,在一阶逻辑中可以表达的每个查询都可以在亚线(更确切地说,多线性)时间处理前阶段后不断拖延地进行。我们引入了一个新的模式,在亚线性(更确切地说,多线性)处理阶段后,可以不断拖延地对受约束程度数据库中的第一阶查询进行大致的查点。由于高概率,因此,如果从实际程度的解算到图级的解解算,则只能进一步读整个输入数据,那么,通过深度的解算,只有深度的解算,然后是图级的解的解号,然后是充分的解算,然后是充分的解算,然后的解算,然后是全部的解的解算。