Deep reinforcement Learning for end-to-end driving is limited by the need of complex reward engineering. Sparse rewards can circumvent this challenge but suffers from long training time and leads to sub-optimal policy. In this work, we explore full-control driving with only goal-constrained sparse reward and propose a curriculum learning approach for end-to-end driving using only navigation view maps that benefit from small virtual-to-real domain gap. To address the complexity of multiple driving policies, we learn concurrent individual policies selected at inference by a navigation system. We demonstrate the ability of our proposal to generalize on unseen road layout, and to drive significantly longer than in the training.
翻译:在这项工作中,我们探索了完全控制驾驶,但只有受目标限制的微薄奖赏,并提议了终端到终端驾驶的课程学习方法,只使用从小的虚拟到现实领域差距中受益的导航视图地图。为了解决多重驾驶政策的复杂性,我们学习了由导航系统推论选定的同时实施个别政策。我们展示了我们的建议能够对看不见的道路布局进行概括化,并大大延长驾驶时间。