We propose a communication-bound-aware cross-domain resource assignment framework for pipeline-parallel distributed training over multi-datacenter optical networks, which lowers iteration time by 31.25% and reduces 13.20% blocking requests compared to baselines.
翻译:我们提出了一种面向多数据中心光网络中流水线并行分布式训练的通信边界感知跨域资源分配框架,该框架相较于基线方法,能够将迭代时间降低31.25%,并将阻塞请求减少13.20%。