Recursive reasoning models achieve remarkable performance on complex reasoning tasks through iterative refinement, enabling tiny networks to match large language models thousands of times their size. However, training remains computationally expensive, prior work reporting approximately 36 GPU-hours per dataset, limiting broader adoption and research. We propose CGAR, a novel training methodology that applies curriculum learning to architectural depth rather than traditional data ordering. CGAR introduces two synergistic components: Progressive Depth Curriculum dynamically adjusts recursion depth from shallow to deep configurations during training, preventing early overfitting while reducing computational cost, and Hierarchical Supervision Weighting applies exponentially decaying importance to supervision steps, aligning loss weighting with observed gradient magnitude decay. On Sudoku-Extreme with 423,168 test puzzles, CGAR achieves 1.71x training speedup (10.93 to 6.38 hours, 42% cost reduction) with only 0.63% accuracy drop (86.65% to 86.02%). Systematic ablations reveal Progressive Depth Curriculum alone achieves 2.26x speedup with 85.47% accuracy, demonstrating a rare Pareto improvement where architectural curriculum simultaneously enhances training efficiency and solution quality. CGAR-trained models exhibit superior inference efficiency with 100% halting accuracy and 11% fewer reasoning steps. Our work demonstrates that principled curriculum on architectural depth enables efficient training of recursive reasoning models on modest hardware. Code and models: https://github.com/Kaleemullahqasim/CGAR and https://huggingface.co/Kaleemullah/trm-cgar-sudoku
翻译:递归推理模型通过迭代优化在复杂推理任务中取得卓越性能,使小型网络能够媲美规模数千倍的大型语言模型。然而,其训练过程计算成本高昂,先前研究报道每个数据集约需36 GPU小时,限制了更广泛的应用与研究。我们提出CGAR,一种新颖的训练方法,将课程学习应用于架构深度而非传统数据排序。CGAR包含两个协同组件:渐进深度课程在训练过程中动态调整递归深度(从浅层到深层配置),防止早期过拟合并降低计算成本;分层监督加权对监督步骤施加指数衰减的重要性权重,使损失加权与观测到的梯度幅度衰减保持一致。在包含423,168个测试谜题的Sudoku-Extreme数据集上,CGAR实现1.71倍训练加速(10.93小时降至6.38小时,成本降低42%),仅伴随0.63%准确率下降(86.65%至86.02%)。系统消融实验表明,仅渐进深度课程即可实现2.26倍加速与85.47%准确率,展现出罕见的帕累托改进——架构课程同时提升训练效率与求解质量。CGAR训练模型展现出更优的推理效率,具备100%停止准确率且推理步骤减少11%。本研究证明,基于架构深度的原理性课程设计能够在有限硬件上高效训练递归推理模型。代码与模型:https://github.com/Kaleemullahqasim/CGAR 与 https://huggingface.co/Kaleemullah/trm-cgar-sudoku