Large language models (LLMs) show strong reasoning via chain-of-thought (CoT) prompting, but the process is opaque, which makes verification, debugging, and control difficult in high-stakes settings. We present Vis-CoT, a human-in-the-loop framework that converts linear CoT text into an interactive reasoning graph. Users can visualize the logical flow, identify flawed steps, and intervene by pruning incorrect paths and grafting new, user-defined premises. This shifts interaction from passive observation to active collaboration, steering models toward more accurate and trustworthy conclusions. Across GSM8K and StrategyQA, Vis-CoT improves final-answer accuracy by up to 24 percentage points over non-interactive baselines. A user study also shows large gains in perceived usability and trust. Vis-CoT points to a practical path for more reliable, understandable, and collaborative reasoning by combining LLMs with targeted human oversight.
翻译:大语言模型通过思维链提示展现出强大的推理能力,但其过程不透明,导致在高风险场景中难以进行验证、调试与控制。本文提出Vis-CoT——一种人机协同框架,可将线性的思维链文本转化为交互式推理图。用户可借此可视化逻辑流、定位错误步骤,并通过剪除错误路径与嫁接用户自定义前提进行干预。该框架将交互模式从被动观察转变为主动协作,引导模型生成更准确可信的结论。在GSM8K和StrategyQA数据集上的实验表明,Vis-CoT相较于非交互基线方法将最终答案准确率最高提升24个百分点。用户研究亦显示其在感知可用性与可信度方面具有显著优势。Vis-CoT通过将大语言模型与针对性人工监督相结合,为构建更可靠、可理解、可协作的推理系统提供了可行路径。