In order to boost the performance of data-intensive computing on HPC systems, in-memory computing frameworks, such as Apache Spark and Flink, use local DRAM for data storage. Optimizing the memory allocation to data storage is critical to delivering performance to traditional HPC compute jobs and throughput to data-intensive applications sharing the HPC resources. Current practices that statically configure in-memory storage may leave inadequate space for compute jobs or lose the opportunity to utilize more available space for data-intensive applications. In this paper, we explore techniques to dynamically adjust in-memory storage and make the right amount of space for compute jobs. We have developed a dynamic memory controller, DynIMS, which infers memory demands of compute tasks online and employs a feedback-based control model to adapt the capacity of in-memory storage. We test DynIMS using mixed HPCC and Spark workloads on a HPC cluster. Experimental results show that DynIMS can achieve up to 5X performance improvement compared to systems with static memory allocations.
翻译:为了提高高电联系统数据密集型计算的性能,如Apache Spark和Flink等模拟计算框架使用本地 DRAM 进行数据存储。优化数据存储的存储分配对于向传统的 HPC 计算任务和数据密集型应用中共享高电联资源的吞吐量提供性能至关重要。目前静态配置模拟存储可能会为计算工作留下不足的空间,或者失去利用更多可用空间进行数据密集型应用的机会。在本文中,我们探索了动态调整内存储和为计算工作创造适当空间的技术。我们开发了动态存储控制器DynIMS,其中推断了在线计算任务的记忆需求,并采用了基于反馈的控制模型来调整中存储能力。我们用混合的 HPC C 和 Sark 来测试高电联组的 DynIMS 。实验结果表明, DynIMS 与静存储分配的系统相比,可以实现高达5X的性能改进。