超越Exascale：Cerebras集群上的数据流域转换算法 (Beyond Exascale: Dataflow Domain Translation on a Cerebras Cluster)

Simulation of physical systems is essential across scientific and engineering domains. Commonly used domain decomposition methods are unable to simultaneously deliver both high simulation rate and high utilization in network computing environments. In particular, Exascale systems deliver only a small fraction their peak performance for these workloads. This paper introduces the novel Domain Translation algorithm, designed to overcome these limitations. On a cluster of 64 Cerebras CS-3 systems, we use this method to demonstrate unprecedented cluster performance across a range of metrics: we show simulations running in excess of 1.6 million time steps per second; we also demonstrate perfect weak scaling at 88% of peak performance. At this cluster scale, our implementation provides 112 PFLOP/s in a power-unconstrained environment, and 57 GFLOP/J in a power-limited environment. We illustrate the method by applying the shallow-water equations to model a tsunami following an asteroid impact at 460m-resolution on a planetary scale.

翻译：物理系统仿真在科学与工程领域具有至关重要的意义。传统常用的域分解方法在网络计算环境中无法同时实现高仿真速率与高资源利用率。尤其值得注意的是，Exascale系统对此类计算负载仅能发挥其峰值性能的极小部分。本文提出创新的域转换算法，旨在突破这些限制。通过在64个Cerebras CS-3系统组成的集群上运用该方法，我们在多项指标中展示了前所未有的集群性能：仿真速度超过每秒160万时间步长；同时在峰值性能88%的水平上实现了完美的弱扩展性。在此集群规模下，我们的实现在功率无约束环境中提供112 PFLOP/s的计算能力，在功率受限环境中达到57 GFLOP/J的能效比。我们通过应用浅水方程模拟行星尺度上小行星撞击引发的海啸（分辨率达460米），具体阐释了该方法的实施过程。