Deep Neural Network (DNN) frameworks use distributed training to enable faster time to convergence and alleviate memory capacity limitations when training large models and/or using high dimension inputs. With the steady increase in datasets and model sizes, model/hybrid parallelism is deemed to have an important role in the future of distributed training of DNNs. We analyze the compute, communication, and memory requirements of Convolutional Neural Networks (CNNs) to understand the trade-offs between different parallelism approaches on performance and scalability. We leverage our model-driven analysis to be the basis for an oracle utility which can help in detecting the limitations and bottlenecks of different parallelism approaches at scale. We evaluate the oracle on six parallelization strategies, with four CNN models and multiple datasets (2D and 3D), on up to 1024 GPUs. The results demonstrate that the oracle has an average accuracy of about 86.74% when compared to empirical results, and as high as 97.57% for data parallelism.
翻译:深神经网络(DNN)框架使用分布式培训,以便在培训大型模型和(或)使用高维投入时,使时间更快地达到趋同并减轻记忆能力限制。随着数据集和模型规模的稳步增加,模型/杂交平行关系在未来对DNN的分布式培训中被认为具有重要作用。我们分析进化神经网络(CNN)的计算、通信和记忆要求,以了解在业绩和可扩缩性方面不同平行做法之间的权衡。我们利用我们的模型驱动分析,作为有助于发现不同平行关系方法的局限性和瓶颈的甲骨文工具的基础。我们评估了六个平行战略的甲骨文,四个CNN模型和多个数据集(2D和3D)在1024 GPU上。结果显示,与经验结果相比,甲骨文的平均精确率约为86.74%,数据平行关系的平均精确度高达97.57%。