CoT has significantly enhanced the reasoning ability of LLMs while it faces challenges when extended to multimodal domains, particularly in mathematical tasks. Existing MLLMs typically perform textual reasoning solely from a single static mathematical image, overlooking dynamic visual acquisition during reasoning. In contrast, humans repeatedly examine visual image and employ step-by-step reasoning to prove intermediate propositions. This strategy of decomposing the problem-solving process into key logical nodes adheres to Miller's Law in cognitive science. Inspired by this insight, we propose a ViRC framework for multimodal mathematical tasks, introducing a Reason Chunking mechanism that structures multimodal mathematical CoT into consecutive Critical Reasoning Units (CRUs) to simulate human expert problem-solving patterns. CRUs ensure intra-unit textual coherence for intermediate proposition verification while integrating visual information across units to generate subsequent propositions and support structured reasoning. To this end, we present CRUX dataset by using three visual tools and four reasoning patterns to provide explicitly annotated CRUs across multiple reasoning paths for each mathematical problem. Leveraging the CRUX dataset, we propose a progressive training strategy inspired by human cognitive learning, which includes Instructional SFT, Practice SFT, and Strategic RL, aimed at further strengthening the Reason Chunking ability of the model.The resulting ViRC-7B model achieves a 18.8\% average improvement over baselines across multiple mathematical benchmarks. Code is available at https://github.com/Leon-LihongWang/ViRC.
翻译:思维链(CoT)显著提升了大型语言模型(LLM)的推理能力,但在扩展到多模态领域时面临挑战,尤其是在数学任务中。现有的多模态大语言模型(MLLM)通常仅从单一静态数学图像进行文本推理,忽视了推理过程中的动态视觉获取。相比之下,人类会反复审视视觉图像,并采用逐步推理来证明中间命题。这种将问题解决过程分解为关键逻辑节点的策略符合认知科学中的米勒定律。受此启发,我们提出了一种面向多模态数学任务的ViRC框架,引入了推理分块机制,将多模态数学思维链结构化为连续的临界推理单元(CRU),以模拟人类专家的问题解决模式。CRU确保单元内文本连贯性以验证中间命题,同时整合跨单元的视觉信息以生成后续命题并支持结构化推理。为此,我们利用三种视觉工具和四种推理模式构建了CRUX数据集,为每个数学问题提供跨多条推理路径的显式标注CRU。基于CRUX数据集,我们提出了一种受人类认知学习启发的渐进式训练策略,包括指令监督微调、实践监督微调和策略强化学习,旨在进一步增强模型的推理分块能力。最终得到的ViRC-7B模型在多个数学基准测试中相比基线平均提升了18.8%。代码发布于https://github.com/Leon-LihongWang/ViRC。