The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all settings. To achieve high performance, the programmer must carefully select which set of optimizations and hardware platforms to use. The GraphIt programming language makes it easy for the programmer to write the algorithm once and optimize it for different inputs using a scheduling language. However, GraphIt currently has no support for generating high performance code for GPUs. Programmers must resort to re-implementing the entire algorithm from scratch in a low-level language with an entirely different set of abstractions and optimizations in order to achieve high performance on GPUs. We propose GG, an extension to the GraphIt compiler framework, that achieves high performance on both CPUs and GPUs using the same algorithm specification. GG significantly expands the optimization space of GPU graph processing frameworks with a novel GPU scheduling language and compiler that enables combining graph optimizations for GPUs. GG also introduces two performance optimizations, Edge-based Thread Warps CTAs load balancing (ETWC) and EdgeBlocking, to expand the optimization space for GPUs. ETWC improves load balancing by dynamically partitioning the edges of each vertex into blocks that are assigned to threads, warps, and CTAs for execution. EdgeBlocking improves the locality of the program by reordering the edges and restricting random memory accesses to fit within the L2 cache. We evaluate GG on 5 algorithms and 9 input graphs on both Pascal and Volta generation NVIDIA GPUs, and show that it achieves up to 5.11x speedup over state-of-the-art GPU graph processing frameworks, and is the fastest on 66 out of the 90 experiments.


翻译:图形程序的性能高度取决于算法、 输入图形的大小和结构以及基础硬件的特性。 没有一套单一的优化或一个硬件平台在所有设置中都能很好地运行。 要实现高性能, 程序员必须仔细选择要使用的优化和硬件平台。 图形用户程序语言使程序员很容易使用一个调度语言来写算法, 并优化它的不同输入。 然而, GrapIt 目前没有为 GPU 生成高性能代码的支持。 程序员必须从低级语言从零开始重新实施整个算法, 并且有一套完全不同的抽象和优化组合。 为了在 GPUPS 上实现高性能优化, 我们建议GPI 程序必须仔细选择GGP, 将GFI 和 GPUPS 的扩展, 将GFIFO 的快速性能操作程序升级到 MIDA 。

0
下载
关闭预览

相关内容

LibRec 精选:AutoML for Contextual Bandits
LibRec智能推荐
7+阅读 · 2019年9月19日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
18+阅读 · 2018年12月24日
spinningup.openai 强化学习资源完整
CreateAMind
6+阅读 · 2018年12月17日
强化学习 cartpole_a3c
CreateAMind
9+阅读 · 2017年7月21日
给DNN处理器跑个分 - 指标篇
StarryHeavensAbove
5+阅读 · 2017年7月9日
Arxiv
0+阅读 · 2021年3月8日
VIP会员
相关资讯
LibRec 精选:AutoML for Contextual Bandits
LibRec智能推荐
7+阅读 · 2019年9月19日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
18+阅读 · 2018年12月24日
spinningup.openai 强化学习资源完整
CreateAMind
6+阅读 · 2018年12月17日
强化学习 cartpole_a3c
CreateAMind
9+阅读 · 2017年7月21日
给DNN处理器跑个分 - 指标篇
StarryHeavensAbove
5+阅读 · 2017年7月9日
Top
微信扫码咨询专知VIP会员