Ecological sciences are using imagery from a variety of sources to monitor and survey populations and ecosystems. Very High Resolution (VHR) satellite imagery provide an effective dataset for large scale surveys. Convolutional Neural Networks have successfully been employed to analyze such imagery and detect large animals. As the datasets increase in volume, O(TB), and number of images, O(1k), utilizing High Performance Computing (HPC) resources becomes necessary. In this paper, we investigate a task-parallel data-driven workflows design to support imagery analysis pipelines with heterogeneous tasks on HPC. We analyze the capabilities of each design when processing a dataset of 3,000 VHR satellite images for a total of 4~TB. We experimentally model the execution time of the tasks of the image processing pipeline. We perform experiments to characterize the resource utilization, total time to completion, and overheads of each design. Based on the model, overhead and utilization analysis, we show which design approach to is best suited in scientific pipelines with similar characteristics.
翻译:甚高分辨率(VHR)卫星图像为大规模调查提供了有效的数据集。 革命神经网络成功地用于分析这类图像和探测大型动物。随着数据集量的增加,O(TB)和图像数量增加,使用高性能计算(HPC)资源的O(1k),使用高性能计算(HPC)资源成为必要。在本文件中,我们调查了任务单数据驱动工作流程设计,以支持高致光谱上不同任务的图像分析管道。我们分析了在处理总共4~TBT3 000 VHR卫星图像数据集时每个设计的能力。我们试验性地模拟图像处理管道任务的执行时间。我们根据模型、间接费用和使用分析,进行了对资源利用、完成总时间和每项设计管理进行特征的实验。我们展示了哪些设计方法最适合具有类似特性的科学管道。