Deploying deep neural networks (DNNs) across homogeneous edge devices (the devices with the same SKU labeled by the manufacturer) often assumes identical performance among them. However, once a device model is widely deployed, the performance of each device becomes different after a period of running. This is caused by the differences in user configurations, environmental conditions, manufacturing variances, battery degradation, etc. Existing DNN compression methods have not taken this scenario into consideration and can not guarantee good compression results in all homogeneous edge devices. To address this, we propose Homogeneous-Device Aware Pruning (HDAP), a hardware-aware DNN compression framework explicitly designed for homogeneous edge devices, aiming to achieve optimal average performance of the compressed model across all devices. To deal with the difficulty of time-consuming hardware-aware evaluations for thousands or millions of homogeneous edge devices, HDAP partitions all the devices into several device clusters, which can dramatically reduce the number of devices to evaluate and use the surrogate-based evaluation instead of hardware evaluation in real-time. Experiments on ResNet50 and MobileNetV1 with the ImageNet dataset show that HDAP consistently achieves lower average inference latency compared with state-of-the-art methods, with substantial speedup gains (e.g., 2.86 $\times$ speedup at 1.0G FLOPs for ResNet50) on the homogeneous device clusters. HDAP offers an effective solution for scalable, high-performance DNN deployment methods for homogeneous edge devices.
翻译:在跨同构边缘设备(制造商标注相同SKU的设备)部署深度神经网络时,通常假设这些设备具有相同的性能。然而,一旦某款设备模型被广泛部署,每台设备在运行一段时间后,其性能会因用户配置、环境条件、制造差异、电池退化等因素而变得不同。现有的深度神经网络压缩方法尚未考虑这一场景,无法保证在所有同构边缘设备上都能获得良好的压缩效果。为此,我们提出了同构设备感知剪枝(HDAP),这是一个专为同构边缘设备设计的硬件感知深度神经网络压缩框架,旨在使压缩模型在所有设备上获得最优的平均性能。为了解决对成千上万台同构边缘设备进行耗时的硬件感知评估的难题,HDAP将所有设备划分为若干设备集群,这能显著减少需要评估的设备数量,并使用基于代理的评估替代实时硬件评估。在ResNet50和MobileNetV1模型上使用ImageNet数据集进行的实验表明,与最先进的方法相比,HDAP在同构设备集群上始终实现了更低的平均推理延迟,并获得了显著的加速增益(例如,ResNet50在1.0G FLOPs下实现了2.86倍的加速)。HDAP为同构边缘设备提供了一种可扩展、高性能的深度神经网络部署解决方案。