能源硬件和意识到工作量的能源硬件和工作量 (Energy hardware and workload aware job scheduling towards interconnected HPC environments)

New HPC machines are getting close to the exascale. Power consumption for those machines has been increasing, and researchers are studying ways to reduce it. A second trend is HPC machines' growing complexity, with increasing heterogeneous hardware components and different clusters architectures cooperating in the same machine. We refer to these environments with the term heterogeneous multi-cluster environments. With the aim of optimizing performance and energy consumption in these environments, this paper proposes an Energy-Aware-Multi-Cluster (EAMC) job scheduling policy. EAMC-policy is able to optimize the scheduling and placement of jobs by predicting performance and energy consumption of arriving jobs for different hardware architectures and processor frequencies, reducing workload's energy consumption, makespan, and response time. The policy assigns a different priority to each job-resource combination so that the most efficient ones are favored, while less efficient ones are still considered on a variable degree, reducing response time and increasing cluster utilization. We implemented EAMC-policy in Slurm, and we evaluated a scenario in which two CPU clusters collaborate in the same machine. Simulations of workloads running applications modeled from real-world show a reduction of response time and makespan by up to 25% and 6% while saving up to 20% of total energy consumed when compared to policies minimizing runtime, and by 49%, 26%, and 6% compared to policies minimizing energy.

翻译：新的 HPC 机器正在接近升级。这些机器的电力消耗一直在增加, 研究人员正在研究如何减少这些机器。第二个趋势是 HPC 机器日益复杂, 日益复杂, 日益多样化的硬件组件和不同组群结构在同一个机器中合作。我们指的是这些环境, 使用不同多组环境。为了优化这些环境中的性能和能源消耗, 本文建议了一种能源- Aware- Multi- Cluster(EAMC) 的工作时间安排政策。 EAMC 政策通过预测不同硬件架构和处理频率到来的工作的性能和能源消耗, 从而优化工作的时间安排和职位安排。第二个趋势是HPC 机器日益复杂, 其复杂性日益增大, 硬件组件日益复杂, 硬件组件日益多样化, 硬件组件日益多样化, 以及不同组群集的组合体结构。政策为每个工作资源组合分配了不同的优先事项, 以便最有效率的组合得到偏好, 而效率则仍然在不同的程度上被考虑, 降低反应时间, 减少反应时间和增加。我们在 Slurm 实施了 Eurm (E) 和我们评估了两个计算机中合作的 EPCU 6 的工作量。模拟 6 和模拟了工作量, 相对于实际- 25- pastime 25- primeme 20- pri- pri- pri- pri- pri- pri- s- pre- pri- pri- pri- pri- pri- pri- sal- pri- pay- pri- pal- pal- pal- 20- pay- pal- d- d- d- pay- 20