OpenAI并非本质上的机器人，而是一家人工智能研究公司，其着眼点是研究和开拓未来的AI科技应用领域。优化针对异构计算架构的高性能Linpack (Optimizing High-Performance Linpack for Exascale Accelerated Architectures) - 专知论文

会员服务 ·

0

异构 · 人工智能研究 · 异构计算 · 中央处理器 (CPU) · 分解 ·

2023 年 4 月 20 日

Optimizing High-Performance Linpack for Exascale Accelerated Architectures

翻译：OpenAI并非本质上的机器人，而是一家人工智能研究公司，其着眼点是研究和开拓未来的AI科技应用领域。优化针对异构计算架构的高性能Linpack

Noel Chalmers,Jakub Kurzak,Damon McDougall,Paul T. Bauman

We detail the performance optimizations made in rocHPL, AMD's open-source implementation of the High-Performance Linpack (HPL) benchmark targeting accelerated node architectures designed for exascale systems such as the Frontier supercomputer. The implementation leverages the high-throughput GPU accelerators on the node via highly optimized linear algebra libraries, as well as the entire CPU socket to perform latency-sensitive factorization phases. We detail novel performance improvements such as a multi-threaded approach to computing the panel factorization phase on the CPU, time-sharing of CPU cores between processes on the node, as well as several optimizations which hide MPI communication. We present some performance results of this implementation of the HPL benchmark on a single node of the Frontier early access cluster at Oak Ridge National Laboratory, as well as scaling to multiple nodes.

翻译：---- 本文详细介绍了rocHPL的性能优化，rocHPL是AMD针对异构节点架构设计的超级计算机Frontier等超级计算机的开源实现。该实现利用高吞吐量的GPU加速器进行高度优化的线性代数库计算，并通过整个CPU插座执行延迟敏感的分解阶段。本文详细介绍了一些性能改进，例如在CPU上计算面板因式分解阶段的多线程方法、在节点上进程之间的CPU核的时间共享、以及隐藏MPI通信的几个优化。我们展示了这个HPL基准的实现在奥克岭国家实验室Frontier早期访问集群的单个节点上的一些性能结果，以及多个节点的扩展。

0

相关内容

【干货书】深度学习数学：理解神经网络，347页pdf

【干货书】深度学习数学：理解神经网络，347页pdf

专知会员服务

267+阅读 · 2022年7月3日

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

专知会员服务

16+阅读 · 2022年3月17日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

106+阅读 · 2021年10月30日

【干货书】面向程序员的机器学习与人工智能的教科书，681页DF

【干货书】面向程序员的机器学习与人工智能的教科书，681页DF

专知会员服务

121+阅读 · 2021年7月1日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

博士申请 | 美国约翰霍普金斯大学ECE系Sijia Geng老师招收全奖博士生

博士申请 | 美国约翰霍普金斯大学ECE系Sijia Geng老师招收全奖博士生

PaperWeekly

0+阅读 · 2022年11月13日

通过集成 XNNPACK 实现推理速度飞跃

通过集成 XNNPACK 实现推理速度飞跃

TensorFlow

26+阅读 · 2020年7月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CCCF专栏 | 从2018年的戈登•贝尔奖说起

CCCF专栏 | 从2018年的戈登•贝尔奖说起

中国计算机学会

10+阅读 · 2019年1月17日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

深度学习与NLP

15+阅读 · 2018年9月8日

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

专知

18+阅读 · 2018年7月15日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

基于优化Schwarz算法的非线性预条件问题

国家自然科学基金

0+阅读 · 2015年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于多目标决策的关中天水经济区水资源时空优化配置研究

国家自然科学基金

0+阅读 · 2014年12月31日

GPU程序访存行为分析和优化关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向高性能计算应用的软件定义网络技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

位阻型咔唑/芴双极性主体材料的设计、合成及其发光器件

国家自然科学基金

0+阅读 · 2013年12月31日

面向云服务数据中心的OpenScale全光交换网络

国家自然科学基金

3+阅读 · 2013年12月31日

面向高精度计算领域动态可配置加速器体系结构研究

国家自然科学基金

0+阅读 · 2013年12月31日

网络p-重心选址反问题的复杂性与算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于高性能集群计算的围棋机器博弈关键算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Gaussian Mixture Representations for Tensor Time Series Forecasting

Arxiv

0+阅读 · 2023年6月7日

Potential of the Julia programming language for high energy physics computing

Arxiv

0+阅读 · 2023年6月6日

FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks with Efficient DSP and Memory Optimization

Arxiv

0+阅读 · 2023年6月6日

DSL-driven Integration of HTTP Services in DIME

Arxiv

0+阅读 · 2023年6月4日

A provably stable and high-order accurate finite difference approximation for the incompressible boundary layer equations

Arxiv

0+阅读 · 2023年6月3日

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月2日

A systematic literature review on solution approaches for the index tracking problem in the last decade

Arxiv

0+阅读 · 2023年6月2日

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Arxiv

11+阅读 · 2023年3月5日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Arxiv

14+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

人工智能研究

中央处理器 (CPU)

相关VIP内容

【干货书】深度学习数学：理解神经网络，347页pdf

【干货书】深度学习数学：理解神经网络，347页pdf

专知会员服务

267+阅读 · 2022年7月3日

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

专知会员服务

16+阅读 · 2022年3月17日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

106+阅读 · 2021年10月30日

【干货书】面向程序员的机器学习与人工智能的教科书，681页DF

【干货书】面向程序员的机器学习与人工智能的教科书，681页DF

专知会员服务

121+阅读 · 2021年7月1日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于强化学习、动态规划与列生成的大规模优化方法》MIT 博士论文

EMNLP 2025 | RTQA：递归思想求解复杂的时间知识图谱问答

欧盟防务准备路线图：目标、冲突与2030之路（附“2030年防务准备路线图”原文）

《面向空军的知识图谱即解决方案：领域知识有效融入大语言模型》

相关资讯

博士申请 | 美国约翰霍普金斯大学ECE系Sijia Geng老师招收全奖博士生

博士申请 | 美国约翰霍普金斯大学ECE系Sijia Geng老师招收全奖博士生

PaperWeekly

0+阅读 · 2022年11月13日

通过集成 XNNPACK 实现推理速度飞跃

通过集成 XNNPACK 实现推理速度飞跃

TensorFlow

26+阅读 · 2020年7月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

CCCF专栏 | 从2018年的戈登•贝尔奖说起

CCCF专栏 | 从2018年的戈登•贝尔奖说起

中国计算机学会

10+阅读 · 2019年1月17日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

AI/ML/DNN硬件加速设计怎么入门？

AI/ML/DNN硬件加速设计怎么入门？

StarryHeavensAbove

11+阅读 · 2018年12月4日

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

AI实战圣经《Machine Learning Yearning》第1-52章中英文版pdf分享

深度学习与NLP

15+阅读 · 2018年9月8日

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

斯坦福大学Fall 2018课程-机器学习硬件加速器( 附PPT下载)

专知

18+阅读 · 2018年7月15日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

相关论文

Learning Gaussian Mixture Representations for Tensor Time Series Forecasting

Arxiv

0+阅读 · 2023年6月7日

Potential of the Julia programming language for high energy physics computing

Arxiv

0+阅读 · 2023年6月6日

FireFly: A High-Throughput Hardware Accelerator for Spiking Neural Networks with Efficient DSP and Memory Optimization

Arxiv

0+阅读 · 2023年6月6日

DSL-driven Integration of HTTP Services in DIME

Arxiv

0+阅读 · 2023年6月4日

A provably stable and high-order accurate finite difference approximation for the incompressible boundary layer equations

Arxiv

0+阅读 · 2023年6月3日

Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Arxiv

0+阅读 · 2023年6月2日

A systematic literature review on solution approaches for the index tracking problem in the last decade

Arxiv

0+阅读 · 2023年6月2日

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Arxiv

11+阅读 · 2023年3月5日

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

Arxiv

34+阅读 · 2022年6月30日

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Arxiv

14+阅读 · 2018年1月31日

相关基金

基于优化Schwarz算法的非线性预条件问题

国家自然科学基金

0+阅读 · 2015年12月31日

面向高性能异构众核架构的大规模CFD并行算法与应用

国家自然科学基金

0+阅读 · 2015年12月31日

基于多目标决策的关中天水经济区水资源时空优化配置研究

国家自然科学基金

0+阅读 · 2014年12月31日

GPU程序访存行为分析和优化关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向高性能计算应用的软件定义网络技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

位阻型咔唑/芴双极性主体材料的设计、合成及其发光器件

国家自然科学基金

0+阅读 · 2013年12月31日

面向云服务数据中心的OpenScale全光交换网络

国家自然科学基金

3+阅读 · 2013年12月31日

面向高精度计算领域动态可配置加速器体系结构研究

国家自然科学基金

0+阅读 · 2013年12月31日

网络p-重心选址反问题的复杂性与算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于高性能集群计算的围棋机器博弈关键算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员