This paper aims at providing extremely efficient algorithms for approximate query enumeration on sparse databases, that come with performance and accuracy guarantees. We introduce a new model for approximate query enumeration on classes of relational databases of bounded degree. We first prove that on databases of bounded degree any local first-order definable query can be enumerated approximately with constant delay after a constant time preprocessing phase. We extend this, showing that on databases of bounded tree-width and bounded degree, every query that is expressible in first-order logic can be enumerated approximately with constant delay after a sublinear (more precisely, polylogarithmic) time preprocessing phase. Durand and Grandjean (ACM Transactions on Computational Logic 2007) proved that exact enumeration of first-order queries on databases of bounded degree can be done with constant delay after a linear time preprocessing phase. Hence we achieve a significant speed-up in the preprocessing phase. Since sublinear running time does not allow reading the whole input database even once, sacrificing some accuracy is inevitable for our speed-up. Nevertheless, our enumeration algorithms come with guarantees: With high probability, (1) only tuples are enumerated that are answers to the query or `close' to being answers to the query, and (2) if the proportion of tuples that are answers to the query is sufficiently large, then all answers will be enumerated. Here the notion of `closeness' is a tuple edit distance in the input database. For local first-order queries, only actual answers are enumerated, strengthening (1). Moreover, both the `closeness' and the proportion required in (2) are controllable. We combine methods from property testing of bounded degree graphs with logic and query enumeration, which we believe can inspire further research.


翻译:本文旨在为稀有数据库的近似查点提供极为高效的算法,这种算法具有性能和准确性保障。我们引入了一个新的模型,用于对受约束程度关系数据库类别进行近似查点。我们首先证明,在受约束程度数据库中,任何本地一级自定查询都可以在固定的预处理阶段过后不断拖延地进行。我们扩展了这一算法,显示在受约束树宽和受约束程度数据库中,在一阶逻辑中可以表达的每个查询都可以在亚线(更确切地说,多线性)时间处理前阶段后不断拖延地进行。我们引入了一个新的模式,在亚线性(更确切地说,多线性)处理阶段后,可以不断拖延地对受约束程度数据库中的第一阶查询进行大致的查点。由于高概率,因此,如果从实际程度的解算到图级的解解算,则只能进一步读整个输入数据,那么,通过深度的解算,只有深度的解算,然后是图级的解的解号,然后是充分的解算,然后是充分的解算,然后的解算,然后是全部的解的解算。

0
下载
关闭预览

相关内容

专知会员服务
52+阅读 · 2020年9月7日
Python图像处理,366页pdf,Image Operators Image Processing in Python
【陈天奇】TVM:端到端自动深度学习编译器,244页ppt
专知会员服务
85+阅读 · 2020年5月11日
Keras François Chollet 《Deep Learning with Python 》, 386页pdf
专知会员服务
144+阅读 · 2019年10月12日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
99+阅读 · 2019年10月9日
已删除
将门创投
12+阅读 · 2018年6月25日
Arxiv
0+阅读 · 2021年3月9日
Arxiv
0+阅读 · 2021年3月9日
VIP会员
相关资讯
已删除
将门创投
12+阅读 · 2018年6月25日
Top
微信扫码咨询专知VIP会员