超越表面推理：揭示扩散大语言模型真正的长链思维推理能力 (Beyond Surface Reasoning: Unveiling the True Long Chain-of-Thought Capacity of Diffusion Large Language Models)

Recently, Diffusion Large Language Models (DLLMs) have offered high throughput and effective sequential reasoning, making them a competitive alternative to autoregressive LLMs (ALLMs). However, parallel decoding, which enables simultaneous token updates, conflicts with the causal order often required for rigorous reasoning. We first identify this conflict as the core Parallel-Sequential Contradiction (PSC). Behavioral analyses in both simple and complex reasoning tasks show that DLLMs exhibit genuine parallelism only for directly decidable outputs. As task difficulty increases, they revert to autoregressive-like behavior, a limitation exacerbated by autoregressive prompting, which nearly doubles the number of decoding steps with remasking without improving quality. Moreover, PSC restricts DLLMs' self-reflection, reasoning depth, and exploratory breadth. To further characterize PSC, we introduce three scaling dimensions for DLLMs: parallel, diffusion, and sequential. Empirically, while parallel scaling yields consistent improvements, diffusion and sequential scaling are constrained by PSC. Based on these findings, we propose several practical mitigations, parallel-oriented prompting, diffusion early stopping, and parallel scaling, to reduce PSC-induced ineffectiveness and inefficiencies.

翻译：近年来，扩散大语言模型（DLLMs）凭借其高吞吐量和有效的序列推理能力，已成为自回归大语言模型（ALLMs）的有力竞争者。然而，支持同时更新多个标记的并行解码机制，往往与严谨推理所需的因果顺序相冲突。我们首先将这一矛盾界定为并行-序列矛盾（PSC）。在简单与复杂推理任务中的行为分析表明，DLLMs仅在可直接判定的输出上表现出真正的并行性。随着任务难度增加，它们会退化为类似自回归的行为模式，而自回归提示策略会加剧这一局限，即在未提升推理质量的情况下，通过重掩码操作使解码步骤数近乎翻倍。此外，PSC限制了DLLMs的自我反思能力、推理深度以及探索广度。为深入刻画PSC，我们为DLLMs引入了三个扩展维度：并行扩展、扩散扩展与序列扩展。实证研究表明，虽然并行扩展能带来持续的性能提升，但扩散扩展与序列扩展均受限于PSC。基于这些发现，我们提出了若干实用缓解策略，包括并行导向提示、扩散早停机制以及并行扩展，以减轻由PSC引发的效能不足与效率低下问题。