A* RGBD室内机器人导航强化学习课程方法 (An A* Curriculum Approach to Reinforcement Learning for RGBD Indoor Robot Navigation)

Training robots to navigate diverse environments is a challenging problem as it involves the confluence of several different perception tasks such as mapping and localization, followed by optimal path-planning and control. Recently released photo-realistic simulators such as Habitat allow for the training of networks that output control actions directly from perception: agents use Deep Reinforcement Learning (DRL) to regress directly from the camera image to a control output in an end-to-end fashion. This is data-inefficient and can take several days to train on a GPU. Our paper tries to overcome this problem by separating the training of the perception and control neural nets and increasing the path complexity gradually using a curriculum approach. Specifically, a pre-trained twin Variational AutoEncoder (VAE) is used to compress RGBD (RGB & depth) sensing from an environment into a latent embedding, which is then used to train a DRL-based control policy. A*, a traditional path-planner is used as a guide for the policy and the distance between start and target locations is incrementally increased along the A* route, as training progresses. We demonstrate the efficacy of the proposed approach, both in terms of increased performance and decreased training times for the PointNav task in the Habitat simulation environment. This strategy of improving the training of direct-perception based DRL navigation policies is expected to hasten the deployment of robots of particular interest to industry such as co-bots on the factory floor and last-mile delivery robots.

翻译：培训机器人以引导不同环境是一个具有挑战性的问题,因为它涉及若干不同的认知任务,例如绘图和本地化,然后是最佳的路径规划和控制。最近发布的摄影现实模拟器,如人居中心,可以对直接从感知中输出控制行动的网络进行培训:代理人使用深强化学习(DRL)直接从摄像图像回归到以端到端的方式控制输出。这是数据效率低下的,在GPU上培训可能需要几天时间。我们的文件试图通过将感知和控制神经网培训与控制神经网相结合来克服这一问题,并逐步使用课程方法提高路径复杂性。具体地说,一个经过事先训练的双级双级自动 Encorder(VAE)用来将RGBD(RGB & 深度)从环境感知到潜在嵌入层,然后用来训练基于DRUP的控制政策。A*,一个传统的路径规划器被用作政策和目标地点之间距离的指南,随着A* 沿A* 路线逐步增加路径的路径。具体地,一个经过训练的双级自动智能的自动智能自动转换(VDRB) 的交付策略,我们展示了在最终的进度上提升的进度,从而显示了对RGBDRBL的预期环境的预期的定位的进度,从而提高了的进度,从而提高了的飞行的飞行的进度。