Restore: 用于快速恢复过失容忍值的快速恢复的模拟复制StorragE (ReStore: In-Memory REplicated STORagE for Rapid Recovery in Fault-Tolerant Algorithms) - 专知论文

会员服务 ·

0

Processing（编程语言） · Storage · Continuity · 簇 · Bioinformatics ·

2023 年 1 月 25 日

ReStore: In-Memory REplicated STORagE for Rapid Recovery in Fault-Tolerant Algorithms

翻译： Restore: 用于快速恢复过失容忍值的快速恢复的模拟复制StorragE

Lukas Hübner,Demian Hespe,Peter Sanders,Alexandros Stamatakis

Fault-tolerant distributed applications require mechanisms to recover data lost via a process failure. On modern cluster systems it is typically impractical to request replacement resources after such a failure. Therefore, applications have to continue working with the remaining resources. This requires redistributing the workload and that the non-failed processes reload data. We present an algorithmic framework and its C++ library implementation ReStore for MPI programs that enables recovery of data after process failures. By storing all required data in memory via an appropriate data distribution and replication, recovery is substantially faster than with standard checkpointing schemes that rely on a parallel file system. As the application developer can specify which data to load, we also support shrinking recovery instead of recovery using spare compute nodes. We evaluate ReStore in both controlled, isolated environments and real applications. Our experiments show loading times of lost input data in the range of milliseconds on up to 24 576 processors and a substantial speedup of the recovery time for the fault-tolerant version of a widely used bioinformatics application.

翻译：在现代集束系统中,在出现故障后请求替换资源通常不切实际。因此,应用程序必须继续使用剩余资源。这需要重新分配工作量,且非失败的流程重新装入数据。我们为进程失败后能够恢复数据的MPI程序提出了一个算法框架及其C++图书馆实施ReStore。通过适当数据分发和复制将所有所需数据存储在记忆中,回收速度大大快于依赖平行文件系统的标准检查站计划。由于应用程序开发者可以指定要装载哪些数据,我们也支持减少回收,而不是使用备用计算节点进行回收。我们评估在受控、孤立环境和实际应用中的ReStore。我们的实验显示,在最多24 576个处理器的毫秒范围内输入数据损失的负荷时间,以及广泛使用的生物信息学应用的错误识别版本的恢复时间大大加快。

0

相关内容

Processing（编程语言）

Processing（编程语言）

Processing 是一门开源编程语言和与之配套的集成开发环境（IDE）的名称。Processing 在电子艺术和视觉设计社区被用来教授编程基础，并运用于大量的新媒体和互动艺术作品中。

【NLP| 推荐文章】基于知识库的问答系统关键技术综述（Core techniques of question answering systems over knowledge bases：a survey）

专知会员服务

47+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

内源性逆转录病毒在小鼠胚胎干细胞中的转录抑制机制

国家自然科学基金

0+阅读 · 2016年12月31日

氧化石墨烯基复合物的合成及在放射性废水处理中的吸附性能

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ERR-alpha 小分子激动剂及其对糖脂代谢调控的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNAs在非小细胞肺癌EGFR-TKIs耐药中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

面向Web Service的服务质量预测技术研究

国家自然科学基金

0+阅读 · 2010年12月31日

高Li/Mn比低维纳米氧化物合成及锂吸附性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

柴油机尾气排放NOx-PM-HC-CO污染物耦合催化去除的研究

国家自然科学基金

0+阅读 · 2008年12月31日

曲古菌素A对人类体细胞核移植胚胎表观遗传重编程的影响

国家自然科学基金

0+阅读 · 2008年12月31日

Constrained Reinforcement Learning and Formal Verification for Safe Colonoscopy Navigation

Arxiv

0+阅读 · 2023年3月16日

Full Abstraction for Free

Arxiv

0+阅读 · 2023年3月16日

Randomized Kaczmarz method with adaptive stepsizes for inconsistent linear systems

Arxiv

0+阅读 · 2023年3月16日

NovelCraft: A Dataset for Novelty Detection and Discovery in Open Worlds

Arxiv

0+阅读 · 2023年3月15日

Statistical learning on measures: an application to persistence diagrams

Arxiv

0+阅读 · 2023年3月15日

Shared memory parallelism in Modern C++ and HPX

Arxiv

0+阅读 · 2023年3月14日

Efficient Fault Detection Architecture of Bit-Parallel Multiplier in Polynomial Basis of GF(2m) Using BCH Code

Arxiv

0+阅读 · 2023年3月14日

Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions

Arxiv

1+阅读 · 2023年3月14日

X-Former: In-Memory Acceleration of Transformers

Arxiv

0+阅读 · 2023年3月13日

Deep Class-Incremental Learning: A Survey

Arxiv

13+阅读 · 2023年2月7日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【NLP| 推荐文章】基于知识库的问答系统关键技术综述（Core techniques of question answering systems over knowledge bases：a survey）

专知会员服务

47+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的检索与结构化增强生成综述

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

【CMU博士论文】交互驱动的人体动作估计与生成

如何避免生成式人工智能在作战中失控失效

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Constrained Reinforcement Learning and Formal Verification for Safe Colonoscopy Navigation

Arxiv

0+阅读 · 2023年3月16日

Full Abstraction for Free

Arxiv

0+阅读 · 2023年3月16日

Randomized Kaczmarz method with adaptive stepsizes for inconsistent linear systems

Arxiv

0+阅读 · 2023年3月16日

NovelCraft: A Dataset for Novelty Detection and Discovery in Open Worlds

Arxiv

0+阅读 · 2023年3月15日

Statistical learning on measures: an application to persistence diagrams

Arxiv

0+阅读 · 2023年3月15日

Shared memory parallelism in Modern C++ and HPX

Arxiv

0+阅读 · 2023年3月14日

Efficient Fault Detection Architecture of Bit-Parallel Multiplier in Polynomial Basis of GF(2m) Using BCH Code

Arxiv

0+阅读 · 2023年3月14日

Evaluation of ChatGPT as a Question Answering System for Answering Complex Questions

Arxiv

1+阅读 · 2023年3月14日

X-Former: In-Memory Acceleration of Transformers

Arxiv

0+阅读 · 2023年3月13日

Deep Class-Incremental Learning: A Survey

Arxiv

13+阅读 · 2023年2月7日

相关基金

内源性逆转录病毒在小鼠胚胎干细胞中的转录抑制机制

国家自然科学基金

0+阅读 · 2016年12月31日

氧化石墨烯基复合物的合成及在放射性废水处理中的吸附性能

国家自然科学基金

0+阅读 · 2013年12月31日

组蛋白去乙酰化酶抑制剂对骨关节炎中Notch-NFAT信号通路调控的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ERR-alpha 小分子激动剂及其对糖脂代谢调控的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

LncRNAs在非小细胞肺癌EGFR-TKIs耐药中的作用及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

面向Web Service的服务质量预测技术研究

国家自然科学基金

0+阅读 · 2010年12月31日

高Li/Mn比低维纳米氧化物合成及锂吸附性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

柴油机尾气排放NOx-PM-HC-CO污染物耦合催化去除的研究

国家自然科学基金

0+阅读 · 2008年12月31日

曲古菌素A对人类体细胞核移植胚胎表观遗传重编程的影响

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员