面向机器学习专用集成电路的智能算子生成系统 (Agentic Operator Generation for ML ASICs)

Alec M. Hammond,Aram Markosyan,Aman Dontula,Simon Mahns,Zacharias Fisches,Dmitrii Pedchenko,Keyur Muzumdar,Natacha Supper,Mark Saroufim,Joe Isaacson,Laura Wang,Warren Hunt,Kaustubh Gondkar,Roman Levenstein,Gabriel Synnaeve,Richard Li,Jacob Kahn,Ajit Mathews

We present TritorX, an agentic AI system designed to generate functionally correct Triton PyTorch ATen kernels at scale for emerging accelerator platforms. TritorX integrates open-source large language models with a custom linter, JIT compilation, and a PyTorch OpInfo-based test harness. This pipeline is compatible with both real Meta Training and Inference Accelerator (MTIA) silicon and in hardware simulation environments for next-generation devices. In contrast to previous kernel-generation approaches that prioritize performance for a limited set of high-usage kernels, TritorX prioritizes coverage. Our system emphasizes correctness and generality across the entire operator set, including diverse data types, shapes, and argument patterns. In our experiments, TritorX successfully generated kernels and wrappers for 481 unique ATen operators that pass all corresponding PyTorch OpInfo tests (over 20,000 in total). TritorX paves the way for overnight generation of complete PyTorch ATen backends for new accelerator platforms.

翻译：本文提出TritorX——一种智能AI系统，旨在为新兴加速器平台大规模生成功能正确的Triton PyTorch ATen内核。TritorX将开源大语言模型与定制化代码检查器、即时编译技术以及基于PyTorch OpInfo的测试框架相集成。该流水线兼容真实的Meta训练与推理加速器（MTIA）硬件芯片，同时支持下一代设备的硬件仿真环境。相较于以往优先为有限高频使用内核优化性能的核函数生成方案，TritorX更注重覆盖广度。我们的系统强调在整个算子集合中实现正确性与通用性，涵盖多样化数据类型、张量形状及参数模式。实验表明，TritorX成功为481个独特ATen算子生成可通过全部对应PyTorch OpInfo测试（总计超20,000项）的内核与封装层。TritorX为新一代加速器平台实现‘隔夜生成’完整PyTorch ATen后端的能力开辟了道路。