Existing end-to-end autonomous driving methods typically rely on imitation learning (IL) but face a key challenge: the misalignment between open-loop training and closed-loop deployment. This misalignment often triggers driver-initiated takeovers and system disengagements during closed-loop execution. How to leverage those expert takeover data from disengagement scenarios and effectively expand the IL policy's capability presents a valuable yet unexplored challenge. In this paper, we propose TakeAD, a novel preference-based post-optimization framework that fine-tunes the pre-trained IL policy with this disengagement data to enhance the closed-loop driving performance. First, we design an efficient expert takeover data collection pipeline inspired by human takeover mechanisms in real-world autonomous driving systems. Then, this post optimization framework integrates iterative Dataset Aggregation (DAgger) for imitation learning with Direct Preference Optimization (DPO) for preference alignment. The DAgger stage equips the policy with fundamental capabilities to handle disengagement states through direct imitation of expert interventions. Subsequently, the DPO stage refines the policy's behavior to better align with expert preferences in disengagement scenarios. Through multiple iterations, the policy progressively learns recovery strategies for disengagement states, thereby mitigating the open-loop gap. Experiments on the closed-loop Bench2Drive benchmark demonstrate our method's effectiveness compared with pure IL methods, with comprehensive ablations confirming the contribution of each component.
翻译:现有的端到端自动驾驶方法通常依赖于模仿学习,但面临一个关键挑战:开环训练与闭环部署之间的错配。这种错配在闭环执行过程中常引发驾驶员发起的接管和系统脱离。如何利用这些脱离场景中的专家接管数据,并有效扩展模仿学习策略的能力,是一个有价值但尚未被充分探索的挑战。本文提出TakeAD,一种新颖的基于偏好的后优化框架,该框架利用脱离数据对预训练的模仿学习策略进行微调,以提升闭环驾驶性能。首先,我们受现实世界自动驾驶系统中人类接管机制的启发,设计了一套高效的专家接管数据收集流程。随后,该后优化框架将用于模仿学习的迭代数据集聚合与用于偏好对齐的直接偏好优化相结合。数据集聚合阶段通过直接模仿专家干预,使策略获得处理脱离状态的基础能力。接着,直接偏好优化阶段进一步优化策略行为,使其在脱离场景中更好地符合专家偏好。经过多次迭代,策略逐步学习针对脱离状态的恢复策略,从而缓解开环差距。在闭环Bench2Drive基准测试上的实验证明了本方法相较于纯模仿学习方法的有效性,全面的消融实验也验证了各组成部分的贡献。