MARS: 不可移动的行为者-加强学校教育计划表 (MARS: Malleable Actor-Critic Reinforcement Learning Scheduler)

In this paper, we introduce MARS, a new scheduling system for HPC-cloud infrastructures based on a cost-aware, flexible reinforcement learning approach, which serves as an intermediate layer for next generation HPC-cloud resource manager. MARS ensembles the pre-trained models from heuristic workloads and decides on the most cost-effective strategy for optimization. A whole workflow application would be split into several optimizable dependent sub-tasks, then based on the pre-defined resource management plan, a reward will be generated after executing a scheduled task. Lastly, MARS updates the Deep Neural Network (DNN) model based on the reward. MARS is designed to optimize the existing models through reinforcement mechanisms. MARS adapts to the dynamics of workflow applications, selects the most cost-effective scheduling solution among pre-built scheduling strategies (backfilling, SJF, etc.) and self-learning deep neural network model at run-time. We evaluate MARS with different real-world workflow traces. MARS can achieve 5%-60% increased performance compared to the state-of-the-art approaches.

翻译：在本文中,我们介绍了一个基于成本意识的灵活强化学习方法的HPC-Cloud基础设施新时间安排系统,该系统是新一代HPC-Cloud资源管理人的中间层,是HPC-Cloud资源管理人的一种中间层。MARS将经过预先培训的模型从繁忙的工作量中归纳出来,并决定最有成本效益的优化战略。整个工作流程应用程序将分成若干可优化的附属子任务,然后根据预先确定的资源管理计划,在执行预定任务后将获得奖励。最后,MARS更新了基于奖励的深神经网络模式。MARS旨在通过强化机制优化现有模型。MARS适应工作流程应用程序的动态,选择了预先制定的时间安排战略(补装、SJF等)中最具成本效益的时间安排解决方案,并在运行时自行学习深神经网络模式。我们用不同的实体工作流程轨迹对MARS进行评估。MARS可以比目前采用的方法提高5%-60%的绩效。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日