DARM: 用于SIMT Thread 差异减少的控制光流熔化 -- -- 扩展版本 (DARM: Control-Flow Melding for SIMT Thread Divergence Reduction -- Extended Version) - 专知论文

会员服务 ·

0

散度 · 可约的 · Branch · Performer · 相似度 ·

2022 年 1 月 14 日

DARM: Control-Flow Melding for SIMT Thread Divergence Reduction -- Extended Version

翻译：DARM: 用于SIMT Thread 差异减少的控制光流熔化 -- -- 扩展版本

Charitha Saumya,Kirshanthan Sundararajah,Milind Kulkarni

GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group of threads-wavefront or warp-execute instructions in lockstep. When threads in a group encounter a branching instruction, not all threads in the group take the same path, a phenomenon known as control-flow divergence. The control-flow divergence causes performance degradation because both paths of the branch must be executed one after the other. Prior research has primarily addressed this issue through architectural modifications. We observe that certain GPGPU kernels with control-flow divergence have similar control-flow structures with similar instructions on both sides of a branch. This structure can be exploited to reduce control-flow divergence by melding the two sides of the branch allowing threads to reconverge early, reducing divergence. In this work, we present DARM, a compiler analysis and transformation framework that can meld divergent control-flow structures with similar instruction sequences. We show that DARM can reduce the performance degradation from control-flow divergence.

翻译：GPGPPP 使用单一指示- 多元- 轨迹( SIMT) 执行模式, 即一组线条- 波浪前或曲速- Excute 指令在锁定处使用。当一个组的线条遇到分支指令时, 不是全部线条都走同一路径, 这种现象被称为控制流差异。控制流差异导致性能退化, 因为分支的两条路径必须执行一个又一个。先前的研究主要通过建筑修改来解决这个问题。我们观察到, 某些控制流差异的GPGPUPU内核有类似的控制流结构, 分支两侧都有类似的指令。这个结构可以被利用来减少控制流差异, 其方法是将分支两侧的线线条混合, 以早期重新配置, 减少差异。在这项工作中, 我们提出 DARM, 一个编译器分析和转换框架, 能够将不同的控制流结构与类似的指令序列混合。我们表明 DARM 可以减少控制流差异的性能降解。

0

相关内容

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

心脏的多形态耦合与层级级联计算可视化方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于对合否定的SBL公理化扩张系统的程度化推理及逻辑控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

旋量玻色爱因斯坦凝聚体动力学的解析研究

国家自然科学基金

1+阅读 · 2014年12月31日

极大似然minwise哈希估计子研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于光电镊子免标记SERS微流控芯片的单个病原菌实时检测研究

国家自然科学基金

0+阅读 · 2013年12月31日

Pt/TiMxOy/Pt/Si界面调控及忆阻行为调制机理

国家自然科学基金

0+阅读 · 2012年12月31日

癌症的靶向基因 - 痘苗溶瘤病毒治疗策略

国家自然科学基金

1+阅读 · 2012年12月31日

数量性状基因定位分析中随机模型方差组分的回归解法

国家自然科学基金

0+阅读 · 2011年12月31日

维甲酸上调肺癌细胞中miRNA let7a表达的调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

复杂疾病的单体型关联分析方法

国家自然科学基金

0+阅读 · 2009年12月31日

Approximate Sampling and Counting of Graphs with Near-$P$-stable Degree Intervals

Arxiv

0+阅读 · 2022年4月20日

Adaptive Non-linear Filtering Technique for Image Restoration

Arxiv

1+阅读 · 2022年4月20日

What is the optimal schedule for the UEFA Champions League groups?

Arxiv

0+阅读 · 2022年4月19日

Partial Relaxed Optimal Transport for Denoised Recommendation

Arxiv

0+阅读 · 2022年4月19日

The Z-axis, X-axis, Weight and Disambiguation Methods for Constructing Local Reference Frame in 3D Registration: An Evaluation

Arxiv

0+阅读 · 2022年4月17日

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Arxiv

0+阅读 · 2022年4月15日

Prefix-Free Coding for LQG Control

Prefix-Free Coding for LQG Control

Arxiv

0+阅读 · 2022年4月15日

Warped Dynamic Linear Models for Time Series of Counts

Warped Dynamic Linear Models for Time Series of Counts

Arxiv

0+阅读 · 2022年4月15日

Proximal nested sampling for high-dimensional Bayesian model selection

Proximal nested sampling for high-dimensional Bayesian model selection

Arxiv

0+阅读 · 2022年4月15日

On Scheduling Mechanisms Beyond the Worst Case

Arxiv

0+阅读 · 2022年4月14日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】开放数据结构，Open Data Structures，337页pdf

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

17+阅读 · 2021年9月17日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

AI CITY发展研究报告：“人工智能+”时代的智慧城市发展范式创新（2025年）

风格迁移：十年综述

【ICCV2025】CL-Splats：结合局部优化的高斯泼洒持续学习方法

【HKUST博士论文】迈向可扩展且具泛化能力的时空预测

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Approximate Sampling and Counting of Graphs with Near-$P$-stable Degree Intervals

Arxiv

0+阅读 · 2022年4月20日

Adaptive Non-linear Filtering Technique for Image Restoration

Arxiv

1+阅读 · 2022年4月20日

What is the optimal schedule for the UEFA Champions League groups?

Arxiv

0+阅读 · 2022年4月19日

Partial Relaxed Optimal Transport for Denoised Recommendation

Arxiv

0+阅读 · 2022年4月19日

The Z-axis, X-axis, Weight and Disambiguation Methods for Constructing Local Reference Frame in 3D Registration: An Evaluation

Arxiv

0+阅读 · 2022年4月17日

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Arxiv

0+阅读 · 2022年4月15日

Prefix-Free Coding for LQG Control

Prefix-Free Coding for LQG Control

Arxiv

0+阅读 · 2022年4月15日

Warped Dynamic Linear Models for Time Series of Counts

Warped Dynamic Linear Models for Time Series of Counts

Arxiv

0+阅读 · 2022年4月15日

Proximal nested sampling for high-dimensional Bayesian model selection

Proximal nested sampling for high-dimensional Bayesian model selection

Arxiv

0+阅读 · 2022年4月15日

On Scheduling Mechanisms Beyond the Worst Case

Arxiv

0+阅读 · 2022年4月14日

相关基金

心脏的多形态耦合与层级级联计算可视化方法的研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于对合否定的SBL公理化扩张系统的程度化推理及逻辑控制研究

国家自然科学基金

0+阅读 · 2014年12月31日

旋量玻色爱因斯坦凝聚体动力学的解析研究

国家自然科学基金

1+阅读 · 2014年12月31日

极大似然minwise哈希估计子研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于光电镊子免标记SERS微流控芯片的单个病原菌实时检测研究

国家自然科学基金

0+阅读 · 2013年12月31日

Pt/TiMxOy/Pt/Si界面调控及忆阻行为调制机理

国家自然科学基金

0+阅读 · 2012年12月31日

癌症的靶向基因 - 痘苗溶瘤病毒治疗策略

国家自然科学基金

1+阅读 · 2012年12月31日

数量性状基因定位分析中随机模型方差组分的回归解法

国家自然科学基金

0+阅读 · 2011年12月31日

维甲酸上调肺癌细胞中miRNA let7a表达的调控机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

复杂疾病的单体型关联分析方法

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员