EAGER：面向自适应推理时间缩放的熵感知生成 (EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling)

With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and then reallocates the saved compute budget to the instances where exploration of alternative paths is most needed. We find that across multiple open-source models on complex reasoning benchmarks such as AIME 2025, EAGer can reallocate the budget without accessing target labels, achieving the best efficiency-performance trade-off in terms of reasoning length and Pass@k. When target labels are accessible, EAGer generates up to 65% fewer tokens (hence saving compute) and achieves up to 37% improvement in Pass@k compared to the Full Parallel Sampling.

翻译：随着推理语言模型和测试时间缩放方法作为提升模型性能范式的兴起，通常需要大量计算从同一提示生成多个候选序列。这使得探索通往正确答案的不同推理路径成为可能，但为每个提示分配了相同的计算预算。基于不同提示具有不同复杂度因而需要不同计算量的假设，我们提出EAGER——一种无需训练、通过词元级熵分布利用模型不确定性来减少冗余计算并同步提升整体性能的生成方法。EAGER仅在高熵词元出现时才允许分支至多条推理路径，随后将节省的计算预算重新分配给最需要探索替代路径的实例。我们在AIME 2025等复杂推理基准测试中发现，对于多个开源模型，EAGER无需访问目标标签即可实现预算重分配，在推理长度和Pass@k方面达到最佳效率-性能平衡。当可访问目标标签时，相比完全并行采样，EAGER最多可减少65%的生成词元（从而节省计算量），并在Pass@k指标上实现最高37%的提升。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日