Stencil 计算高效矢量计划 (An Efficient Vectorization Scheme for Stencil Computation) - 专知论文

会员服务 ·

0

向量化 · 可约的 · Performer · AIM · 数据并行 ·

2021 年 3 月 16 日

An Efficient Vectorization Scheme for Stencil Computation

翻译：Stencil 计算高效矢量计划

Kun Li,Liang Yuan,Yunquan Zhang,Yue Yue,Hang Cao,Pengqi Lu

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data locality respectively. In this paper, the downsides of existing vectorization schemes are analyzed. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. Then we propose a novel transpose layout to preserve the data locality for tiling and reduce the data reorganization overhead for vectorization simultaneously. To further improve the data reuse at the register level, a time loop unroll-and-jam strategy is designed to perform multistep stencil computation along the time dimension. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.

翻译：Stencils 计算是各种科学和工程应用中最重要的内核之一,各种工作都集中在矢量化和砖瓦技术上,旨在分别利用核心数据平行和数据位置。本文分析了现有矢量化计划的下方。简而言之,它们要么引起数据对齐冲突,要么在与平铺合并时损害数据位置。然后,我们提出一个新的转换布局,以保存数据配置地点,同时减少矢量化的数据重组间接费用。为了进一步改善在登记一级的数据再利用,设计了一个时间循环、无滚动和干扰战略,以便在时间方面进行多步的加速计算。AVX-2和AVX-512 CPU的实验结果显示,我们的方法具有竞争性的性能。

0

相关内容

向量化

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【推荐】用于解缠学习的半监督StyleGAN，Semi-Supervised StyleGAN for Disentanglement Learning

【推荐】用于解缠学习的半监督StyleGAN，Semi-Supervised StyleGAN for Disentanglement Learning

专知会员服务

36+阅读 · 2020年3月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

SepNE: Bringing Separability to Network Embedding

SepNE: Bringing Separability to Network Embedding

Arxiv

3+阅读 · 2019年2月26日

Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification

Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification

Arxiv

4+阅读 · 2019年2月11日

Attributed Network Embedding for Incomplete Structure Information

Attributed Network Embedding for Incomplete Structure Information

Arxiv

3+阅读 · 2018年11月28日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Density-Adaptive Kernel based Re-Ranking for Person Re-Identification

Arxiv

4+阅读 · 2018年5月20日

Adversarial Binary Coding for Efficient Person Re-identification

Arxiv

4+阅读 · 2018年4月6日

VIP会员

文章信息

相关主题

相关VIP内容

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

54+阅读 · 2021年1月20日

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

知识图谱推理，50页ppt，Salesforce首席科学家Richard Socher

专知会员服务

111+阅读 · 2020年6月10日

【推荐】用于解缠学习的半监督StyleGAN，Semi-Supervised StyleGAN for Disentanglement Learning

【推荐】用于解缠学习的半监督StyleGAN，Semi-Supervised StyleGAN for Disentanglement Learning

专知会员服务

36+阅读 · 2020年3月13日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

俄乌战争启示：坦克战与不断演变的战斗形态

《大规模作战行动中与无人机集成的C5ISR系统》

《主观概率约束下寻找可行系统及其军事应用》69页

《美政府问责局：多种挑战影响地面战车任务出勤率》2025最新130页

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

【推荐】Python机器学习生态圈(Scikit-Learn相关项目)

机器学习研究会

6+阅读 · 2017年8月23日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Hierarchical Graph Capsule Network

Hierarchical Graph Capsule Network

Arxiv

20+阅读 · 2020年12月16日

CEDR: Contextualized Embeddings for Document Ranking

Arxiv

4+阅读 · 2019年8月19日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Efficient Parameter-free Clustering Using First Neighbor Relations

Efficient Parameter-free Clustering Using First Neighbor Relations

Arxiv

7+阅读 · 2019年2月28日

SepNE: Bringing Separability to Network Embedding

SepNE: Bringing Separability to Network Embedding

Arxiv

3+阅读 · 2019年2月26日

Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification

Deep Node Ranking: an Algorithm for Structural Network Embedding and End-to-End Classification

Arxiv

4+阅读 · 2019年2月11日

Attributed Network Embedding for Incomplete Structure Information

Attributed Network Embedding for Incomplete Structure Information

Arxiv

3+阅读 · 2018年11月28日

Efficient and Effective $L_0$ Feature Selection

Efficient and Effective $L_0$ Feature Selection

Arxiv

5+阅读 · 2018年8月7日

Density-Adaptive Kernel based Re-Ranking for Person Re-Identification

Arxiv

4+阅读 · 2018年5月20日

Adversarial Binary Coding for Efficient Person Re-identification

Arxiv

4+阅读 · 2018年4月6日

微信扫码咨询专知VIP会员