With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and then reallocates the saved compute budget to the instances where exploration of alternative paths is most needed. We find that across multiple open-source models on complex reasoning benchmarks such as AIME 2025, EAGer can reallocate the budget without accessing target labels, achieving the best efficiency-performance trade-off in terms of reasoning length and Pass@k. When target labels are accessible, EAGer generates up to 65% fewer tokens (hence saving compute) and achieves up to 37% improvement in Pass@k compared to the Full Parallel Sampling.
翻译:随着推理语言模型和测试时间缩放方法作为提升模型性能范式的兴起,通常需要大量计算从同一提示生成多个候选序列。这使得探索通往正确答案的不同推理路径成为可能,但为每个提示分配了相同的计算预算。基于不同提示具有不同复杂度因而需要不同计算量的假设,我们提出EAGER——一种无需训练、通过词元级熵分布利用模型不确定性来减少冗余计算并同步提升整体性能的生成方法。EAGER仅在高熵词元出现时才允许分支至多条推理路径,随后将节省的计算预算重新分配给最需要探索替代路径的实例。我们在AIME 2025等复杂推理基准测试中发现,对于多个开源模型,EAGER无需访问目标标签即可实现预算重分配,在推理长度和Pass@k方面达到最佳效率-性能平衡。当可访问目标标签时,相比完全并行采样,EAGER最多可减少65%的生成词元(从而节省计算量),并在Pass@k指标上实现最高37%的提升。