Thinker：通过多轮交互实现深度搜索的分层思维训练LLM (Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction)

Jun Xu,Xinkai Du,Yu Ao,Peilong Zhao,Yang Li,Ling Zhong,Lin Yuan,Zhongpu Bo,Xiaorui Wang,Mengshu Sun,Zhengke Gui,Dalong Zhang,Zhaoyang Wang,Qiwei Wang,Yangyang Hou,Zhiying Yin,Haofen Wang,Huajun Chen,Lei Liang,Jun Zhou

from arxiv, Accepted to AAAI 2026. Extended version with full Appendix

Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes. The source code is available at https://github.com/OpenSPG/KAG-Thinker.

翻译：高效检索外部知识库和网页对于增强大型语言模型（LLM）的推理能力至关重要。以往训练LLM利用外部检索器解决复杂问题的工作主要采用端到端强化学习，但这些方法忽视了对推理过程的监督，难以保证逻辑连贯性和严谨性。为应对这些局限，我们提出Thinker——一种通过多轮交互实现深度搜索的分层思维模型，使推理过程可监督且可验证。该模型将复杂问题分解为可独立求解的子问题，每个子问题均以自然语言和等效逻辑函数双重表示，以支持知识库和网络搜索。同时，子问题间的依赖关系通过这些逻辑函数作为参数传递，增强了问题解决过程的逻辑一致性。为避免不必要的外部搜索，我们执行知识边界判定，检查子问题是否在LLM的内在知识范围内，使其能够直接回答。实验结果表明，仅需数百个训练样本，Thinker的性能即可与现有基线方法相竞争。此外，当扩展至完整训练集时，Thinker在多种数据集和模型规模上均显著优于这些方法。源代码发布于https://github.com/OpenSPG/KAG-Thinker。