This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classification trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing. LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization. We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases.
翻译:本文介绍了LMFAO(Layed 多重功能综合优化),这是投入数据库中成批集集集料的模拟优化和执行引擎,这项工作的主要动力来自以下观察:对于数据库中的各种分析,其数据密集型任务可以分解成一组集,在输入数据库关系的结合中,它们的数据密集型任务可以分解成一组。我们展示了LMFAO的多功能性和竞争力,用于少数广泛使用的解析学:学习脊脊线回归、分类树、回归树和Bayesian网络结构,使用Chow-Liu树;以及数据仓储中用于勘探的数据立方体。LMFAO由若干层次的逻辑和代码优化组成,系统地利用计算、平行和代码专业化的共享。我们进行了两种性能基准:在四个数据集的实验中,LMFAO以若干级的尺寸为外形,用于计算集料的商业数据库系统和MnetDB,以及用于对模型/ADC进行超版学习的TensorFlow、Skit、R和ADC数据库。