DEFER:深神经网络分布式边缘推推 (DEFER: Distributed Edge Inference for Deep Neural Networks)

Modern machine learning tools such as deep neural networks (DNNs) are playing a revolutionary role in many fields such as natural language processing, computer vision, and the internet of things. Once they are trained, deep learning models can be deployed on edge computers to perform classification and prediction on real-time data for these applications. Particularly for large models, the limited computational and memory resources on a single edge device can become the throughput bottleneck for an inference pipeline. To increase throughput and decrease per-device compute load, we present DEFER (Distributed Edge inFERence), a framework for distributed edge inference, which partitions deep neural networks into layers that can be spread across multiple compute nodes. The architecture consists of a single "dispatcher" node to distribute DNN partitions and inference data to respective compute nodes. The compute nodes are connected in a series pattern where each node's computed result is relayed to the subsequent node. The result is then returned to the Dispatcher. We quantify the throughput, energy consumption, network payload, and overhead for our framework under realistic network conditions using the CORE network emulator. We find that for the ResNet50 model, the inference throughput of DEFER with 8 compute nodes is 53% higher and per node energy consumption is 63% lower than single device inference. We further reduce network communication demands and energy consumption using the ZFP serialization and LZ4 compression algorithms. We have implemented DEFER in Python using the TensorFlow and Keras ML libraries, and have released DEFER as an open-source framework to benefit the research community.

翻译：深神经网络(DNNS)等现代机器学习工具正在自然语言处理、计算机视觉和互联网等许多领域发挥革命性作用。一旦培训了深学习模型, 可以在边缘计算机上部署深学习模型, 对这些应用程序的实时数据进行分类和预测。特别是大型模型, 单边缘设备上有限的计算和记忆资源可以成为导引管道的过量瓶颈。为了增加过量和减少每分数的计算负负负, 我们提出DEFER( 分散的 EDGE 调频), 一个分布式边缘推力框架, 将深神经网络网分为可分布于多个compater节点的层。结构由单一的“ dispatcher” 节点组成, 将DNN分区和推力数据传送到相应的节点。 compute 节点的计算结果被转至随后的节点。因此, 发送器会返回我们量化过量、能源消耗、网络有效载量和DODO 。在现实的网络中, 网络中, 将使用 ASelfreal Reval Reldeal us commax 的网络 compeal dedeal deal deal dede lax lax lax lax lax lax the the compeal defreal defer lax lax lax the compeal defer compeal defreal defreal defreal defre

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日