MLIR编译器中自动代码优化的强化学习环境 (A Reinforcement Learning Environment for Automatic Code Optimization in the MLIR Compiler)

Code optimization is a crucial task that aims to enhance code performance. However, this process is often tedious and complex, highlighting the necessity for automatic code optimization techniques. Reinforcement Learning (RL) has emerged as a promising approach for tackling such complex optimization problems. In this project, we introduce MLIR RL, an RL environment for the MLIR compiler, dedicated to facilitating MLIR compiler research and enabling automatic code optimization. We propose a multi-discrete formulation of the action space where the action space is the Cartesian product of simpler action subspaces. We also propose a new method, called level pointers, to reduce the size of the action space related to the loop interchange transformation. This enables more efficient and effective learning of the policy. To demonstrate the effectiveness of MLIR RL, we train an RL agent to optimize MLIR Linalg code, targeting CPU. The code is generated from two domain-specific frameworks: deep-learning models generated from PyTorch, and LQCD (Lattice Quantum Chromodynamics) code generated from an LQCD compiler. The result of this work is a research environment that allows the community to experiment with novel ideas in RL-driven loop-nest optimization.

翻译：代码优化是提升代码性能的关键任务。然而，该过程通常繁琐且复杂，凸显了自动代码优化技术的必要性。强化学习已成为解决此类复杂优化问题的一种有前景的方法。在本项目中，我们介绍了MLIR RL——一个专为MLIR编译器设计的强化学习环境，旨在促进MLIR编译器研究并实现自动代码优化。我们提出了一种多离散动作空间建模方法，其中动作空间由更简单的动作子空间的笛卡尔积构成。此外，我们提出了一种称为"层级指针"的新方法，用于缩减与循环交换变换相关的动作空间规模，从而更高效地学习策略。为验证MLIR RL的有效性，我们训练了一个强化学习智能体来优化面向CPU的MLIR Linalg代码。这些代码源自两个领域专用框架：由PyTorch生成的深度学习模型，以及由LQCD（格点量子色动力学）编译器生成的LQCD代码。本工作的成果是一个研究环境，可供学术界在强化学习驱动的循环嵌套优化领域进行创新性实验。