The advancement of Text-to-SQL systems is currently hindered by the scarcity of high-quality training data and the limited reasoning capabilities of models in complex scenarios. In this paper, we propose a holistic framework that addresses these issues through a dual-centric approach. From a Data-Centric perspective, we construct an iterative data factory that synthesizes RL-ready data characterized by high correctness and precise semantic-logic alignment, ensured by strict verification. From a Model-Centric perspective, we introduce a novel Agentic Reinforcement Learning framework. This framework employs a Diversity-Aware Cold Start stage to initialize a robust policy, followed by Group Relative Policy Optimization (GRPO) to refine the agent's reasoning via environmental feedback. Extensive experiments on BIRD and Spider benchmarks demonstrate that our synergistic approach achieves state-of-the-art performance among single-model methods.
翻译:当前,Text-to-SQL 系统的发展受到高质量训练数据稀缺以及模型在复杂场景下推理能力有限的制约。本文提出一个整体性框架,通过双中心路径解决这些问题。从数据中心的视角,我们构建了一个迭代式数据工厂,合成具备高正确性和精确语义-逻辑对齐的、可用于强化学习的训练数据,并通过严格验证确保其质量。从模型中心的视角,我们引入了一种新颖的智能体强化学习框架。该框架采用一个多样性感知的冷启动阶段来初始化稳健的策略,随后通过群组相对策略优化(GRPO)利用环境反馈来精炼智能体的推理能力。在 BIRD 和 Spider 基准测试上进行的大量实验表明,我们的协同方法在单模型方法中实现了最先进的性能。