利用黑牛核查算法驾驶的强化学习自我提高安全性能</s> (Self-Improving Safety Performance of Reinforcement Learning Based Driving with Black-Box Verification Algorithms)

In this work, we propose a self-improving artificial intelligence system to enhance the safety performance of reinforcement learning (RL)-based autonomous driving (AD) agents using black-box verification methods. RL algorithms have become popular in AD applications in recent years. However, the performance of existing RL algorithms heavily depends on the diversity of training scenarios. A lack of safety-critical scenarios during the training phase could result in poor generalization performance in real-world driving applications. We propose a novel framework in which the weaknesses of the training set are explored through black-box verification methods. After discovering AD failure scenarios, the RL agent's training is re-initiated via transfer learning to improve the performance of previously unsafe scenarios. Simulation results demonstrate that our approach efficiently discovers safety failures of action decisions in RL-based adaptive cruise control (ACC) applications and significantly reduces the number of vehicle collisions through iterative applications of our method. The source code is publicly available at https://github.com/data-and-decision-lab/self-improving-RL.

翻译：在这项工作中,我们提出一个自我改进人工智能系统,用黑盒核查方法提高强化学习(RL)自主驾驶(AD)代理器的安全性能,使用黑盒核查方法。近年来,RL算法在AD应用中变得很流行。但是,现有的RL算法的性能在很大程度上取决于培训的情景的多样性。培训阶段缺乏安全临界情景可能会导致现实世界驾驶应用软件的概括性表现不佳。我们提议了一个新颖的框架,通过黑盒核查方法探索成套培训的弱点。在发现AD失败情形后,RL代理器的培训通过转让学习重新启动,以改善先前不安全情景的性能。模拟结果表明,我们的方法有效地发现,在基于RL的适应性巡航控制(ACC)应用中,行动决定的安全性失灵,并通过我们方法的迭接应用,大大减少车辆碰撞的次数。源码公布于https://github.com/data-and-decis-lab/self-impal-RL。</s>