In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, satellites, weather stations, robots, farm equipment, farmers and agribusinesses, etc. Besides, agricultural datasets are very large, complex, unstructured, heterogeneous, non-standardized, and inconsistent. Hence, the agricultural data mining is considered as Big Data application in terms of volume, variety, velocity and veracity. It is a key foundation to establishing a crop intelligence platform, which will enable resource efficient agronomy decision making and recommendations. In this paper, we designed and implemented a continental level agricultural data warehouse by combining Hive, MongoDB and Cassandra. Our data warehouse capabilities: (1) flexible schema; (2) data integration from real agricultural multi datasets; (3) data science and business intelligent support; (4) high performance; (5) high storage; (6) security; (7) governance and monitoring; (8) replication and recovery; (9) consistency, availability and partition tolerant; (10) distributed and cloud deployment. We also evaluate the performance of our data warehouse.
翻译:近年来,使用现代信息和通信技术的精密农业正在变得非常受欢迎。原始和半加工农业数据通常通过多种来源收集,例如:Thing(IoT)、传感器、卫星、气象站、机器人、农业设备、农民和农产企业等因特网。此外,农业数据集非常庞大、复杂、结构化、多样化、非标准化和不一致。因此,农业数据开采被视为在数量、种类、速度和真实性方面应用大数据。这是建立作物情报平台的关键基础。该平台将使资源效率高的农学决策和建议得以实现。在本文件中,我们通过将Hive、MongoDB和Cassandra结合起来,设计和实施了大陆一级的农业数据仓。我们的数据仓能力:(1) 灵活的系统;(2) 从实际农业多数据集中整合数据;(3) 数据科学和商业智能支持;(4) 高性业绩;(5) 高储存;(6) 安全;(7) 治理和监测;(8) 复制和复原;(9) 一致性、可获取性和分区性;(10) 分布和云层部署。我们还评估了我们数据仓的绩效。