Long chain-of-thought (Long-CoT) reasoning improves accuracy in LLMs, yet its verbose, self-reflective style often hinders effective distillation into small language models (SLMs). We revisit Long-CoT compression through the lens of capability alignment and ask: Can pruning improve reasoning? We propose Prune-on-Logic, a structure-aware framework that transforms Long-CoT into logic graphs and selectively prunes low-utility reasoning steps under self-verification constraints. Through systematic analysis across three pruning strategies targeting entire chains, core reasoning, and verification, we find that verification pruning consistently improves accuracy while reducing token usage, whereas pruning reasoning steps or indiscriminate pruning degrades performance. Our study reveals that effective pruning aligns supervision with model capacity rather than merely shortening inputs. Gains hold across tasks, model scales, and CoT capability, with larger models benefiting more from pruning due to richer but more redundant reasoning. Our empirical findings highlight pruning as a structural optimization strategy for aligning CoT reasoning with SLM capacity.
翻译:长思维链(Long-CoT)推理提升了大型语言模型(LLM)的准确性,但其冗长且自反思的风格往往阻碍了向小型语言模型(SLM)的有效知识蒸馏。本文从能力对齐的视角重新审视长思维链压缩,并提出核心问题:剪枝能否提升推理能力?我们提出了Prune-on-Logic,一种结构感知框架,该框架将长思维链转化为逻辑图,并在自验证约束下选择性剪枝低效用的推理步骤。通过对针对完整链、核心推理及验证环节的三种剪枝策略进行系统分析,我们发现:验证环节的剪枝在降低令牌使用量的同时能持续提升准确性,而剪枝推理步骤或无差别剪枝则会损害性能。研究表明,有效的剪枝旨在使监督信号与模型能力对齐,而非单纯缩短输入长度。该增益在不同任务、模型规模及思维链能力上均成立,且更大模型因具有更丰富但冗余的推理结构而从剪枝中获益更多。我们的实证结果凸显了剪枝作为一种结构优化策略,在使思维链推理与小型语言模型能力对齐方面的重要价值。