While polyhedral compilers have shown success in implementing advanced code transformations, they still face challenges in selecting the ones that lead to the most profitable speedups. This has motivated the use of machine learning based cost models to guide the search for polyhedral optimizations. State-of-the-art polyhedral compilers have demonstrated a viable proof-of-concept of such an approach. While promising, this approach still faces significant limitations. State-of-the-art polyhedral compilers that use a deep learning cost model only support a small subset of affine transformations, limiting their ability to explore complex code transformations. Furthermore, their applicability does not scale beyond simple programs, thus excluding many program classes from their scope, such as those with non-rectangular iteration domains or multiple loop nests. These limitations significantly impact the generality of such compilers and autoschedulers and put into question the whole approach. In this paper, we introduce LOOPer, the first polyhedral autoscheduler that uses a deep learning based cost model and covers a large space of affine transformations and programs. LOOPer allows the optimization of an extensive set of programs while being effective at applying complex sequences of polyhedral transformations. We implement and evaluate LOOPer and show that it achieves competitive speedups over the state-of-the-art. On the PolyBench benchmarks, LOOPer achieves a geometric mean speedup of 1.84x over Tiramisu and 1.42x over Pluto, two state-of-the-art polyhedral autoschedulers.
翻译:尽管多面体编译器在实现高级代码转换方面已取得成功,但在选择能带来最显著加速效果的转换策略方面仍面临挑战。这促使研究者采用基于机器学习的成本模型来指导多面体优化搜索。当前最先进的多面体编译器已为此类方法提供了可行的概念验证。虽然前景可观,但该方法仍存在明显局限性:现有采用深度学习成本模型的多面体编译器仅支持仿射变换的有限子集,限制了其探索复杂代码转换的能力;此外,其适用性无法扩展到简单程序之外,从而将许多程序类别(如具有非矩形迭代域或多重循环嵌套的程序)排除在适用范围外。这些局限严重影响了此类编译器与自动调度器的通用性,并对整个方法体系提出了质疑。本文提出LOOPer——首个采用深度学习成本模型且覆盖广泛仿射变换与程序类型的多面体自动调度器。LOOPer能够在应用复杂多面体变换序列的同时,实现对大量程序类别的优化。我们实现了LOOPer并对其进行评估,结果表明其在加速性能上达到了当前最优水平。在PolyBench基准测试中,LOOPer相较于两个先进的多面体自动调度器Tiramisu和Pluto,分别实现了1.84倍和1.42倍的几何平均加速比。