As the complexity of deep learning (DL) models increases, their compute requirements increase accordingly. Deploying a Convolutional Neural Network (CNN) involves two phases: training and inference. With the inference task typically taking place on resource-constrained devices, a lot of research has explored the field of low-power inference on custom hardware accelerators. On the other hand, training is both more compute- and memory-intensive and is primarily performed on power-hungry GPUs in large-scale data centres. CNN training on FPGAs is a nascent field of research. This is primarily due to the lack of tools to easily prototype and deploy various hardware and/or algorithmic techniques for power-efficient CNN training. This work presents Barista, an automated toolflow that provides seamless integration of FPGAs into the training of CNNs within the popular deep learning framework Caffe. To the best of our knowledge, this is the only tool that allows for such versatile and rapid deployment of hardware and algorithms for the FPGA-based training of CNNs, providing the necessary infrastructure for further research and development.
翻译:随着深层次学习(DL)模型的复杂性增加,它们的计算要求也相应增加。部署进化神经网络(CNN)涉及两个阶段:培训和推断。由于通常在资源限制的装置上进行推论任务,许多研究探索了定制硬件加速器低功率推导的领域。另一方面,培训既比较计算和记忆密集,又主要在大型数据中心的强力饥饿GPS上进行。有线电视新闻网关于FPGAs的培训是一个新生的研究领域,这主要是由于缺乏便于原型和部署各种硬件和/或算法技术的工具和节能CNN培训。这项工作向Barista展示了一种自动工具流程,它将FPGAs与广受欢迎的深层次学习框架内CNNs的培训无缝地结合起来。据我们所知,这是唯一能够为基于CNNCS的FGA培训迅速和迅速部署硬件和算法的工具,为进一步研究和开发提供必要的基础设施。