We present a deep reinforcement learning approach to optimizing the execution cost of computation graphs in a static compiler. The key idea is to combine a neural network policy with a genetic algorithm, the Biased Random-Key Genetic Algorithm (BRKGA). The policy is trained to predict, given an input graph to be optimized, the node-level probability distributions for sampling mutations and crossovers in BRKGA. Our approach, "REINFORCE-based Genetic Algorithm Learning" (REGAL), uses the policy's ability to transfer to new graphs to significantly improve the solution quality of the genetic algorithm for the same objective evaluation budget. As a concrete application, we show results for minimizing peak memory in TensorFlow graphs by jointly optimizing device placement and scheduling. REGAL achieves on average 3.56% lower peak memory than BRKGA on previously unseen graphs, outperforming all the algorithms we compare to, and giving 4.4x bigger improvement than the next best algorithm. We also evaluate REGAL on a production compiler team's performance benchmark of XLA graphs and achieve on average 3.74% lower peak memory than BRKGA, again outperforming all others. Our approach and analysis is made possible by collecting a dataset of 372 unique real-world TensorFlow graphs, more than an order of magnitude more data than previous work.
翻译:我们提出了一个深度强化学习方法,以优化静态编译器中计算图表的执行成本。 关键的想法是将神经网络政策与遗传算法( BRKGA) 结合使用神经网络政策。 该政策经过培训,通过优化输入图来预测BRKGA中用于取样突变和交叉翻转的节点概率分布。 我们的方法是“ REINFORCE 遗传演算法(REGAL) ”, 使用该政策向新图表传输的能力,以大大改善同一目标评估预算的遗传算法的解决方案质量。 作为具体应用,我们通过共同优化设备定位和时间安排来显示在TensorFlow图中最大限度地减少峰值记忆的结果。 REGAL在先前隐蔽的图表中平均比BRKGA的峰值差3.56%。 我们的算法比下一个最佳算法大4.4倍。 我们还评估了REGAL组生产编算组的XLF图表的性能基准,大大改进了同一目标评估了XLA图表的解决方案的解决方案质量质量。 我们通过平均374%的模型收集了比正常的数据记录, 。