Apache Hive is an open-source relational database system for analytic big-data workloads. In this paper we describe the key innovations on the journey from batch tool to fully fledged enterprise data warehousing system. We present a hybrid architecture that combines traditional MPP techniques with more recent big data and cloud concepts to achieve the scale and performance required by today's analytic applications. We explore the system by detailing enhancements along four main axis: Transactions, optimizer, runtime, and federation. We then provide experimental results to demonstrate the performance of the system for typical workloads and conclude with a look at the community roadmap.
翻译:Apache Hive是一个用于分析大数据工作量的开放源码关系数据库系统。本文描述了从批量工具到完全成熟的企业数据仓储系统的关键创新。我们提出了一个混合结构,将传统的MPP技术与最新的大数据和云层概念结合起来,以实现今天的分析应用程序所要求的规模和性能。我们通过详细说明四个主轴:交易、优化、运行时间和联邦,来探索这个系统。然后我们提供实验结果,以显示系统在典型工作量方面的性能,并以社区路线图结束。