Deploying deep neural networks (DNNs) across homogeneous edge devices (the devices with the same SKU labeled by the manufacturer) often assumes identical performance among them. However, once a device model is widely deployed, the performance of each device becomes different after a period of running. This is caused by the differences in user configurations, environmental conditions, manufacturing variances, battery degradation, etc. Existing DNN compression methods have not taken this scenario into consideration and can not guarantee good compression results in all homogeneous edge devices. To address this, we propose Homogeneous-Device Aware Pruning (HDAP), a hardware-aware DNN compression framework explicitly designed for homogeneous edge devices, aiming to achieve optimal average performance of the compressed model across all devices. To deal with the difficulty of time-consuming hardware-aware evaluations for thousands or millions of homogeneous edge devices, HDAP partitions all the devices into several device clusters, which can dramatically reduce the number of devices to evaluate and use the surrogate-based evaluation instead of hardware evaluation in real-time. Extensive experiments on multiple device types (Jetson Xavier NX and Jetson Nano) and task types (image classification with ResNet50, MobileNetV1, ResNet56, VGG16; object detection with YOLOv8n) demonstrate that HDAP consistently achieves lower average latency and competitive accuracy compared to state-of-the-art methods, with significant speedups (e.g., 2.86$\times$ on ResNet50 at 1.0G FLOPs). HDAP offers an effective solution for scalable, high-performance DNN deployment methods for homogeneous edge devices.
翻译:在同构边缘设备(由制造商标记为相同SKU的设备)上部署深度神经网络(DNNs)通常假设这些设备具有相同的性能。然而,一旦设备模型被广泛部署,每台设备在运行一段时间后性能会变得不同。这主要是由于用户配置、环境条件、制造差异、电池退化等因素造成的差异。现有的DNN压缩方法尚未考虑这一场景,无法保证在所有同构边缘设备上均获得良好的压缩效果。为解决这一问题,我们提出了同构设备感知剪枝(HDAP),这是一个专为同构边缘设备设计的硬件感知DNN压缩框架,旨在使压缩模型在所有设备上实现最优的平均性能。针对数千或数百万台同构边缘设备进行耗时的硬件感知评估的难题,HDAP将所有设备划分为若干设备簇,这能显著减少需要评估的设备数量,并采用基于代理的评估替代实时硬件评估。在多种设备类型(Jetson Xavier NX和Jetson Nano)和任务类型(使用ResNet50、MobileNetV1、ResNet56、VGG16进行图像分类;使用YOLOv8n进行目标检测)上的大量实验表明,与最先进的方法相比,HDAP在保持竞争性精度的同时,始终实现更低的平均延迟,并带来显著的加速效果(例如,在1.0G FLOPs的ResNet50上达到2.86倍加速)。HDAP为同构边缘设备提供了一种可扩展、高性能的DNN部署解决方案。