Reduced-precision and variable-precision multiply-accumulate (MAC) operations provide opportunities to significantly improve energy efficiency and throughput of DNN accelerators with no/limited algorithmic performance loss, paving a way towards deploying AI applications on resource-constraint edge devices. Accordingly, various precision-scalable MAC array (PSMA) architectures were recently proposed. However, it is difficult to make a fair comparison between those alternatives, as each proposed PSMA is demonstrated in different systems with different technologies. This work aims to provide a clear view on the design space of PSMA and offer insights for selecting the optimal architectures based on designers' needs. First, we introduce a precision-enhanced for-loop representation for DNN dataflows. Next, we use this new representation towards a comprehensive PSMA taxonomy, capable to systematically cover most prominent state-of-the-art PSMAs, as well as uncover new PSMA architectures. Following that, we build a highly parameterized PSMA template that can be design-time configured into a huge subset of the design space spanned by the taxonomy. This allows to fairly and thoroughly benchmark 72 different PSMA architectures. We perform such studies in 28nm technology targeting run-time precision scalability from 8 to 2 bits, operating at 200 MHz and 1 GHz. Analyzing resulting energy efficiency and area breakdowns reveals key design guidelines for PSMA architectures.
翻译:降低精确度和可变精确度乘积(MAC)操作为大幅提高节能和DNN加速器(无/有限的算法性性能损失)的吞吐量提供了机会,为在资源限制边缘设备上部署AI应用程序铺平了道路,因此,最近提出了各种精确度可缩放的MAC阵列结构,然而,很难对这些替代品进行公平的比较,因为每个拟议的PSMA都在不同技术不同的系统中展示。这项工作旨在为PSMA的设计空间提供一个清晰的视野,并为根据设计者的需求选择最佳结构提供洞察。首先,我们为DNNN数据流引入了精确加固的Loop代表。接下来,我们利用这种新的代表面来建立一个全面的PSMA分类系统,能够系统地覆盖最突出的状态的PSMA,并发现新的PSMA结构。之后,我们建立了一个高度参数化的PSMA模板,可以根据设计的时间配置为基于设计者需要的最佳结构的BBE-road-loadal 结构,我们通过税制式的SMA系统进行这样的系统运行,从而进行这样的系统化地标定出第2号的地理空间结构。