LLMs now tackle a wide range of software-related tasks, yet we show that their performance varies markedly both across and within these tasks. Routing user queries to the appropriate LLMs can therefore help improve response quality while reducing cost. Prior work, however, has focused mainly on general-purpose LLM routing via black-box models. We introduce Routesplain, the first LLM router for software-related tasks, including multilingual code generation and repair, input/output prediction, and computer science QA. Unlike existing routing approaches, Routesplain first extracts human-interpretable concepts from each query (e.g., task, domain, reasoning complexity) and only routes based on these concepts, thereby providing intelligible, faithful rationales. We evaluate Routesplain on 16 state-of-the-art LLMs across eight software-related tasks; Routesplain outperforms individual models both in terms of accuracy and cost, and equals or surpasses all black-box baselines, with concept-level intervention highlighting avenues for further router improvements.
翻译:当前,大型语言模型(LLMs)已广泛应用于各类软件相关任务,但我们的研究表明,其性能在不同任务之间及同一任务内部均存在显著差异。因此,将用户查询路由至合适的大型语言模型有助于在降低成本的同时提升响应质量。然而,先前的研究主要集中于通过黑盒模型实现通用型大型语言模型路由。本文提出了Routesplain——首个专为软件相关任务设计的大型语言模型路由器,涵盖多语言代码生成与修复、输入/输出预测以及计算机科学问答等任务。与现有路由方法不同,Routesplain首先从每个查询中提取人类可解释的概念(例如任务类型、领域、推理复杂度),并仅基于这些概念进行路由决策,从而提供清晰、忠实的决策依据。我们在八项软件相关任务上对Routesplain进行了评估,涉及16个前沿大型语言模型;结果表明,Routesplain在准确性和成本方面均优于单个模型,且达到或超越了所有黑盒基线方法,概念层级的干预机制为进一步优化路由器性能指明了方向。