Today, Neural Networks are the basis of breakthroughs in virtually every technical domain. Their application to accelerators has recently resulted in better performance and efficiency in these systems. At the same time, the increasing hardware failures due to the latest (shrinked) semiconductor technology needs to be addressed. Since accelerator systems are often used to back time-critical applications such as self-driving cars or medical diagnosis applications, these hardware failures must be eliminated. Our research evaluates these failures from a systemic point of view. Based on our results, we find critical results for the system reliability enhancement and we further put forth an efficient method to avoid these failures with minimal hardware overhead.
翻译:今天,神经网络是几乎所有技术领域突破的基础,它们应用到加速器最近提高了这些系统的性能和效率。与此同时,由于最新的(有限的)半导体技术,硬件故障增加,需要加以解决。由于加速器系统常常被用来支持诸如自行驾驶汽车或医疗诊断应用等时间紧迫的应用,这些硬件故障必须消除。我们的研究从系统的角度评估这些故障。根据我们的结果,我们发现系统可靠性提高的关键结果,并进一步提出了避免这些故障的有效方法,而硬件管理费用却很少。