Recent advances in Large Language Models (LLMs) - particularly model scaling and test-time techniques - have greatly enhanced the reasoning capabilities of language models at the expense of higher inference costs. To lower inference costs, prior works train router models or deferral mechanisms that allocate easy queries to a small, efficient model, while forwarding harder queries to larger, more expensive models. However, these trained router models often lack robustness under domain shifts and require expensive data synthesis techniques such as Monte Carlo rollouts to obtain sufficient ground-truth routing labels for training. In this work, we propose Confidence-Guided Stepwise Model Routing for Cost-Efficient Reasoning (STEER), a domain-agnostic framework that performs fine-grained, step-level routing between smaller and larger LLMs without utilizing external models. STEER leverages confidence scores from the smaller model's logits prior to generating a reasoning step, so that the large model is invoked only when necessary. Extensive evaluations using different LLMs on a diverse set of challenging benchmarks across multiple domains such as Mathematical Reasoning, Multi-Hop QA, and Planning tasks indicate that STEER achieves competitive or enhanced accuracy while reducing inference costs (up to +20% accuracy with 48% less FLOPs compared to solely using the larger model on AIME), outperforming baselines that rely on trained external modules. Our results establish model-internal confidence as a robust, domain-agnostic signal for model routing, offering a scalable pathway for efficient LLM deployment.


翻译:大型语言模型(LLMs)的最新进展——特别是模型规模扩展和测试时技术——显著增强了语言模型的推理能力,但同时也带来了更高的推理成本。为降低推理成本,先前的研究训练了路由模型或延迟机制,将简单查询分配给小型高效模型,而将困难查询转发给更大、更昂贵的模型。然而,这些训练的路由模型在领域转移下往往缺乏鲁棒性,并且需要昂贵的数据合成技术(如蒙特卡洛推演)来获取足够的真实路由标签用于训练。在本研究中,我们提出了置信度引导的逐步模型路由用于成本高效推理(STEER),这是一个领域无关的框架,可在小型和大型LLMs之间执行细粒度的步骤级路由,而无需使用外部模型。STEER利用小型模型在生成推理步骤前从其logits中获得的置信度分数,从而仅在必要时调用大型模型。在数学推理、多跳问答和规划任务等多个领域的多样化挑战性基准测试中,使用不同LLMs进行的广泛评估表明,STEER在降低推理成本的同时实现了竞争性或更高的准确率(例如在AIME基准上,与仅使用大型模型相比,准确率提升高达+20%,且FLOPs减少48%),优于依赖训练外部模块的基线方法。我们的研究结果确立了模型内部置信度作为模型路由的鲁棒、领域无关信号,为高效LLM部署提供了一条可扩展的路径。

0
下载
关闭预览

相关内容

ACM/IEEE第23届模型驱动工程语言和系统国际会议,是模型驱动软件和系统工程的首要会议系列,由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来,模型涵盖了建模的各个方面,从语言和方法到工具和应用程序。模特的参加者来自不同的背景,包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛,参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会,并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。 官网链接:http://www.modelsconference.org/
Top
微信扫码咨询专知VIP会员