描述深神经网络的中子导出错误模型 (Characterizing a Neutron-Induced Fault Model for Deep Neural Networks)

Fernando Fernandes dos Santos,Angeliki Kritikakou,Josie Esteban Rodriguez Condia,Juan David Guerrero Balaguera,Matteo Sonza Reorda,Olivier Sentieys,Paolo Rech

The reliability evaluation of Deep Neural Networks (DNNs) executed on Graphic Processing Units (GPUs) is a challenging problem since the hardware architecture is highly complex and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results since it has limited access to the hardware resources and the adopted fault models may be too naive (i.e., single and double bit flip). Contrarily, physical fault injection with neutron beam provides realistic error rates but lacks fault propagation visibility. This paper proposes a characterization of the DNN fault model combining both neutron beam experiments and fault injection at software level. We exposed GPUs running General Matrix Multiplication (GEMM) and DNNs to beam neutrons to measure their error rate. On DNNs, we observe that the percentage of critical errors can be up to 61%, and show that ECC is ineffective in reducing critical errors. We then performed a complementary software-level fault injection, using fault models derived from RTL simulations. Our results show that by injecting complex fault models, the YOLOv3 misdetection rate is validated to be very close to the rate measured with beam experiments, which is 8.66x higher than the one measured with fault injection using only single-bit flips.

翻译：在图形处理器(GPUs)上实施的深神经网络(DNN)的可靠性评估是一个具有挑战性的问题,因为硬件结构非常复杂,软件框架由多层抽象组成。虽然软件级的过失注射是评价复杂应用可靠性的一个常见和快速的方法,但可能会产生不切实际的结果,因为其使用硬件资源的机会有限,而且所采用的故障模型可能太幼稚(即,单倍翻转) 。相反,用中子波束注射物理过错提供现实的错误率,但缺乏差分传播的可见度。本文建议对DNN(DN)断层模型进行定性,将中子光束实验和软件级的错射结合起来。我们用通用矩阵倍增缩放(GEMM)和DNNPs进行披露,让它们用中子来测量误差率。在DNNW上,关键误差的百分比可能高达61 %,并显示ECC在减少关键错误方面是无效的。我们随后使用由RTL模拟产生的错误模型进行补充软件级的错射。我们的结果显示DNNU值比一个高的错误率率是用来测量的YAV值。我们测测算的错误率是比VVVV的。通过一个高的错误率。

相关内容

MoDELS

关注 30

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

74+阅读 · 2022年3月15日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

55+阅读 · 2020年1月25日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

18+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日