Hardware-agnostic programming with high performance portability will be the bedrock for realizing the ubiquitous adoption of emerging accelerator technologies in future heterogeneous high-performance computing (HPC) systems, which is the key to achieving the next level of HPC performance on an expanding accelerator landscape. In this paper, we present HALO 1.0, an open-ended extensible multi-agent software framework, that implements a set of proposed hardware-agnostic accelerator orchestration (HALO) principles and a novel compute-centric message passing interface (C^2MPI) specification for enabling the portable and performance-optimized execution of hardware-agnostic application codes across heterogeneous accelerator resources. The experiment results of evaluating eight widely used HPC subroutines based on Intel Xeon E5-2620 v4 CPUs, Intel Arria 10 GX FPGAs, and NVIDIA GeForce RTX 2080 Ti GPUs show that HALO 1.0 allows the same hardware-agnostic application codes of the HPC kernels, without any change, to run across all the computing devices with a consistently maximum performance portability score of 1.0, which is 2x-861,883x higher than the OpenCL-based solution that suffers from an unstably low performance portability score.
翻译:高性能高性能高性能高性能高计算系统中新兴加速器技术(C&2MPI)的通用应用规范,是实现未来各种多式高性能计算系统普遍采用新兴加速器技术的基石,这是在不断扩大的加速器景观中实现下一级高常PC性能的关键。在本文中,我们介绍了基于Intel Xeon E5-2620 v4 CPUs、Intel Ariage 10 GX FPGAs 和 NVIDIA GEForce RTX 2080 Ti GPUS, 显示HLO 1.0 允许在可移动性能上实施HPC-8-8的硬性能性能应用码,而这种可持久性能是所有可移动的可移动性水平,而所有可移动的可移动性能超过2级CFL3 的可移动性能。