VIP内容

在这项工作中,我们介绍了一系列的架构修改,旨在提高神经网络的准确性,同时保持他们的GPU训练和推理效率。我们首先演示和讨论由flops优化引起的瓶颈。然后,我们建议更好地利用GPU结构和资产的替代设计。最后,我们介绍了一种新的GPU专用模型,称为TResNet,它比以前的ConvNets具有更好的准确性和效率。使用TResNet模型,与ResNet50的GPU吞吐量相似,在ImageNet上达到80.7%的top-1精度。我们的TResNet模型也能很好地传输竞争数据集,并达到最先进的精度,如Stanford cars(96.0%)、CIFAR-10(99.0%)、CIFAR-100(91.5%)和牛津花卉(99.1%)。实现可在:这个

https://github.com/mrT23/TResNet

成为VIP会员查看完整内容
0
18

最新论文

Deep reinforcement learning (RL) has made groundbreaking advancements in robotics, data center management and other applications. Unfortunately, system-level bottlenecks in RL workloads are poorly understood; we observe fundamental structural differences in RL workloads that make them inherently less GPU-bound than supervised learning (SL). To explain where training time is spent in RL workloads, we propose RL-Scope, a cross-stack profiler that scopes low-level CPU/GPU resource usage to high-level algorithmic operations, and provides accurate insights by correcting for profiling overhead. Using RL-Scope, we survey RL workloads across its major dimensions including ML backend, RL algorithm, and simulator. For ML backends, we explain a $2.3\times$ difference in runtime between equivalent PyTorch and TensorFlow algorithm implementations, and identify a bottleneck rooted in overly abstracted algorithm implementations. For RL algorithms and simulators, we show that on-policy algorithms are at least $3.5\times$ more simulation-bound than off-policy algorithms. Finally, we profile a scale-up workload and demonstrate that GPU utilization metrics reported by commonly used tools dramatically inflate GPU usage, whereas RL-Scope reports true GPU-bound time. RL-Scope is an open-source tool available at https://github.com/UofT-EcoSystem/rlscope .

0
0
下载
预览
Top