We introduce Quokka, the first systematic scaling law for diffusion language models (DLMs), encompassing both compute-constrained and data-constrained regimes, and studying the key modeling and optimization designs. Quokka is a good friend of Chinchilla and provides wider scopes. We hope the results would bring short-term practical guidance in DLMs training and long-term inspirations for the whole AI community.
翻译:我们提出了Quokka,这是首个针对扩散语言模型(DLMs)的系统性缩放定律,涵盖了计算受限与数据受限两种机制,并研究了关键建模与优化设计。Quokka是Chinchilla的良好伙伴,提供了更广泛的研究范畴。我们希望这些成果能为DLMs训练提供短期实用指导,并为整个AI社区带来长期启发。