Recent work shows that, beyond discrete reasoning through explicit chain-of-thought steps, which are limited by the boundaries of natural languages, large language models (LLMs) can also reason continuously in latent space, allowing richer information per step and thereby improving token efficiency. Despite this promise, latent reasoning still faces two challenges, especially in training-free settings: 1) purely latent reasoning broadens the search distribution by maintaining multiple implicit paths, which diffuses probability mass, introduces noise, and impedes convergence to a single high-confidence solution, thereby hurting accuracy; and 2) overthinking persists even without explicit text, wasting tokens and degrading efficiency. To address these issues, we introduce SwiReasoning, a training-free framework for LLM reasoning which features two key innovations: 1) SwiReasoning dynamically switches between explicit and latent reasoning, guided by block-wise confidence estimated from entropy trends in next-token distributions, to balance exploration and exploitation and promote timely convergence. 2) By limiting the maximum number of thinking-block switches, SwiReasoning curbs overthinking and improves token efficiency across varying problem difficulties. On widely used mathematics and STEM benchmarks, SwiReasoning consistently improves average accuracy by 1.5%-2.8% across reasoning LLMs of different model families and scales. Furthermore, under constrained budgets, SwiReasoning improves average token efficiency by 56%-79%, with larger gains as budgets tighten.
翻译:近期研究表明,除了通过显式思维链步骤进行离散推理(受限于自然语言的边界)外,大语言模型(LLMs)还能在潜在空间中进行连续推理,从而允许每一步携带更丰富的信息,并因此提升令牌效率。尽管存在这一潜力,潜在推理仍面临两个挑战,特别是在无需训练的场景中:1)纯潜在推理通过维持多条隐式路径来扩大搜索分布,这会扩散概率质量、引入噪声并阻碍收敛到单一高置信度解,从而损害准确性;2)即使没有显式文本,过度思考现象依然存在,浪费令牌并降低效率。为解决这些问题,我们提出了SwiReasoning,一种无需训练的大语言模型推理框架,其具备两个关键创新:1)SwiReasoning根据基于下一令牌分布熵趋势估计的块级置信度,动态地在显式与潜在推理之间切换,以平衡探索与利用,并促进及时收敛。2)通过限制思维块切换的最大次数,SwiReasoning抑制了过度思考,并在不同问题难度下提升了令牌效率。在广泛使用的数学和STEM基准测试中,SwiReasoning在不同模型家族和规模的大语言模型上,持续将平均准确率提升了1.5%至2.8%。此外,在受限的令牌预算下,SwiReasoning将平均令牌效率提升了56%至79%,且预算越紧,增益越大。