Visual-language reasoning, driving knowledge, and value alignment are essential for advanced autonomous driving systems. However, existing approaches largely rely on data-driven learning, making it difficult to capture the complex logic underlying decision-making through imitation or limited reinforcement rewards. To address this, we propose KnowVal, a new autonomous driving system that enables visual-language reasoning through the synergistic integration of open-world perception and knowledge retrieval. Specifically, we construct a comprehensive driving knowledge graph that encodes traffic laws, defensive driving principles, and ethical norms, complemented by an efficient LLM-based retrieval mechanism tailored for driving scenarios. Furthermore, we develop a human-preference dataset and train a Value Model to guide interpretable, value-aligned trajectory assessment. Experimental results show that our method substantially improves planning performance while remaining compatible with existing architectures. Notably, KnowVal achieves the lowest collision rate on nuScenes and state-of-the-art results on Bench2Drive.
翻译:视觉语言推理、驾驶知识与价值对齐对于高级自动驾驶系统至关重要。然而,现有方法主要依赖于数据驱动的学习,难以通过模仿或有限的强化奖励来捕捉决策背后的复杂逻辑。为解决这一问题,我们提出了KnowVal,一种新的自动驾驶系统,它通过开放世界感知与知识检索的协同整合,实现了视觉语言推理。具体而言,我们构建了一个全面的驾驶知识图谱,其中编码了交通法规、防御性驾驶原则和伦理规范,并辅以一个专为驾驶场景设计的高效基于大语言模型(LLM)的检索机制。此外,我们开发了一个人类偏好数据集,并训练了一个价值模型,以指导可解释的、价值对齐的轨迹评估。实验结果表明,我们的方法在保持与现有架构兼容的同时,显著提升了规划性能。值得注意的是,KnowVal在nuScenes数据集上实现了最低的碰撞率,并在Bench2Drive基准测试中取得了最先进的结果。