CPU-FPGA 混合平台的推断 (Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous Platform) - 专知论文

会员服务 ·

0

推断 · GNN · 核化 · MoDELS · state-of-the-art ·

2022 年 6 月 17 日

Low-latency Mini-batch GNN Inference on CPU-FPGA Heterogeneous Platform

翻译：CPU-FPGA 混合平台的推断

Bingyi Zhang,Hanqing Zeng,Viktor Prasanna

Mini-batch inference of Graph Neural Networks (GNNs) is a key problem in many real-world applications. Recently, a GNN design principle of model depth-receptive field decoupling has been proposed to address the well-known issue of neighborhood explosion. Decoupled GNN models achieve higher accuracy than original models and demonstrate excellent scalability for mini-batch inference. We map Decoupled GNNs onto CPU-FPGA heterogeneous platforms to achieve low-latency mini-batch inference. On the FPGA platform, we design a novel GNN hardware accelerator with an adaptive datapath denoted Adaptive Computation Kernel (ACK) that can execute various computation kernels of GNNs with low-latency: (1) for dense computation kernels expressed as matrix multiplication, ACK works as a systolic array with fully localized connections, (2) for sparse computation kernels, ACK follows the scatter-gather paradigm and works as multiple parallel pipelines to support the irregular connectivity of graphs. The proposed task scheduling hides the CPU-FPGA data communication overhead to reduce the inference latency. We develop a fast design space exploration algorithm to generate a single accelerator for multiple target GNN models. We implement our accelerator on a state-of-the-art CPU-FPGA platform and evaluate the performance using three representative models (GCN, GraphSAGE, and GAT). Results show that our CPU-FPGA implementation achieves $21.4-50.8\times$, $2.9-21.6\times$, $4.7\times$ latency reduction compared with state-of-the-art implementations on CPU-only, CPU-GPU and CPU-FPGA platforms.

翻译：图形神经网络(GNNs)的微小分解是许多现实世界应用中的一个关键问题。最近,为了解决众所周知的邻里爆炸问题,提出了模型深度感应场脱钩的GNN设计原则。脱钩的GNN模型的精确度高于原始模型,并展示了小批量推断的精确度。我们在CPU-FPGA 混合平台上绘制了脱钩的GNNs GNs, 以达到低纬度微型分解。在FPGA平台上,我们设计了一个新型的GNN硬件加速器,配有适应性数据路标的调色点调整调色点的调色点。高压的GNNNs模型显示了密度计算内核内核内核的倍增倍, ACCFC是完全本地连接的组合体阵列, 使用分散式平台的范式, 并同时在多条线平台上设计一个用于支持图表的不规则的CNNNNC-AT调价点连接性连接。高的C-GS

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ICML'21 | 五篇图神经网络论文精选

ICML'21 | 五篇图神经网络论文精选

图与推荐

1+阅读 · 2021年10月15日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

[每周ArXiv] 最新几篇GNN论文

[每周ArXiv] 最新几篇GNN论文

图与推荐

0+阅读 · 2021年5月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

异构动态移动通信网络的延时优化

国家自然科学基金

2+阅读 · 2013年12月31日

HD-Zip转录因子对杜梨耐渗透胁迫的调节机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向基于图的数据挖掘的FPGA加速方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于HM耦合效应的高渗透水压水工隧洞承载机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向嵌入式系统的虚拟化技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

Web Service QoS的多维多尺度模型及评估、预测方法的研究

国家自然科学基金

1+阅读 · 2008年12月31日

Hybrid-ARQ Based Relaying Strategies for Enhancing Reliability in Delay-Bounded Networks

Arxiv

0+阅读 · 2022年8月8日

Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference

Arxiv

0+阅读 · 2022年8月5日

GNN4REL: Graph Neural Networks for Predicting Circuit Reliability Degradation

Arxiv

0+阅读 · 2022年8月4日

Interpretable and Efficient Heterogeneous Graph Convolutional Network

Arxiv

15+阅读 · 2021年9月8日

Federated Causal Inference in Heterogeneous Observational Data

Arxiv

24+阅读 · 2021年8月10日

Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning

Arxiv

15+阅读 · 2021年5月19日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Arxiv

14+阅读 · 2021年2月16日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

Arxiv

44+阅读 · 2020年2月5日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

【CIKM2019 Tutorial】Recent Developments of Deep Heterogeneous Information Network Analysis（深度异构信息网络分析的最新进展），附157页PDF免费下载

专知会员服务

29+阅读 · 2019年11月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军特种作战条令》最新102页

《洛克希德SR-71“黑鸟”侦察机动力系统》21页slides

美空军作战实验室通过人工智能和指挥控制技术创新推进杀伤链

《指挥控制能力分析方法论》最新报告

相关资讯

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ICML'21 | 五篇图神经网络论文精选

ICML'21 | 五篇图神经网络论文精选

图与推荐

1+阅读 · 2021年10月15日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

[每周ArXiv] 最新几篇GNN论文

[每周ArXiv] 最新几篇GNN论文

图与推荐

0+阅读 · 2021年5月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

【论文推荐】最新十篇目标跟踪相关论文—多帧光流跟踪、动态图学习、MV-YOLO、姿态估计、深度核相关滤波、Benchmark

专知

13+阅读 · 2018年5月26日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Hybrid-ARQ Based Relaying Strategies for Enhancing Reliability in Delay-Bounded Networks

Arxiv

0+阅读 · 2022年8月8日

Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference

Arxiv

0+阅读 · 2022年8月5日

GNN4REL: Graph Neural Networks for Predicting Circuit Reliability Degradation

Arxiv

0+阅读 · 2022年8月4日

Interpretable and Efficient Heterogeneous Graph Convolutional Network

Arxiv

15+阅读 · 2021年9月8日

Federated Causal Inference in Heterogeneous Observational Data

Arxiv

24+阅读 · 2021年8月10日

Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning

Arxiv

15+阅读 · 2021年5月19日

GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training

Arxiv

14+阅读 · 2021年2月16日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

Arxiv

44+阅读 · 2020年2月5日

Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

Arxiv

14+阅读 · 2019年8月8日

相关基金

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

异构动态移动通信网络的延时优化

国家自然科学基金

2+阅读 · 2013年12月31日

HD-Zip转录因子对杜梨耐渗透胁迫的调节机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

有限长区域中的空间耦合多元Rateless码研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向基于图的数据挖掘的FPGA加速方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于HM耦合效应的高渗透水压水工隧洞承载机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

面向嵌入式系统的虚拟化技术研究

国家自然科学基金

1+阅读 · 2009年12月31日

矩阵分解的低延迟并行算法

国家自然科学基金

0+阅读 · 2009年12月31日

Web Service QoS的多维多尺度模型及评估、预测方法的研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员