学习寻找:自主源源寻找具有深强化学习的纳米无人机微控制器 (Learning to Seek: Autonomous Source Seeking on a Nano Drone Microcontroller with Deep Reinforcement Learning)

Nano drones are uniquely equipped for fully autonomous applications due to their agility, low cost, and small size. However, their constrained form factor limits flight time, sensor payload, and compute capability, which poses a significant limitation on the use of source-seeking nano drones in GPS-denied and highly cluttered environments. The primary goal of our work is to demonstrate the effectiveness of deep reinforcement learning in fully autonomous navigation on highly constrained, general-purpose hardware and present a methodology for future applications. To this end, we present a deep reinforcement learning-based light seeking policy that executes, in conjunction with the flight control stack, on a commercially available off-the-shelf ultra-low-power microcontroller (MCU). We describe our methodology for training and executing deep reinforcement learning policies for deployment on constrained, general-purpose MCUs. By carefully designing the network input, we feed features relevant to the agent in finding the source, while reducing computational cost and enabling inference up to 100 Hz. We verify our approach with simulation and in-field testing on a Bitcraze CrazyFlie, achieving 94% success rate in a highly cluttered and randomized test environment. The policy demonstrates efficient light seeking by reaching the goal in simulation in 65 % fewer steps and with 60% shorter paths, compared to a baseline `roomba' algorithm.

翻译：纳米无人驾驶飞机由于其灵活性、低成本和规模小,拥有完全自主应用的独特设备。然而,其受限因素限制了飞行时间、传感器有效载荷和计算能力,严重限制了在GPS封闭和高度混乱的环境中使用寻找源的纳米无人驾驶飞机。我们工作的首要目标是展示在高度受限、通用硬件上完全自主导航的深强化学习的有效性,并为未来应用提供一种方法。为此,我们提出了一个深强化学习光源搜索政策,该政策与飞行控制堆一起,在商业上可买到的超低功率微控制器(MCU)之外执行。我们描述了我们在有限、通用 MCUS上部署的培训和实施深度强化学习政策的方法。通过仔细设计网络输入,我们向代理人提供查找源所需的相关特征,同时降低计算成本,使推断能力达到100赫兹。我们通过模拟和现场测试比特克拉·古斯塔弗利,在高度封闭式的超低能微控制台(MCUU)上实现94%的成功率,在高度封闭和低速的测试环境中,通过低速的测试率模拟和低速测试环境,在精准测试中实现65级测试。

相关内容

深度强化学习

关注 142

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

【干货书】Python深度学习第二版，Deep Learning with Python, Second Edition

专知会员服务

152+阅读 · 2020年5月9日

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

51+阅读 · 2020年4月15日

深度学习在自动车辆控制中的应用研究综述（A Survey of Deep Learning Applications to Autonomous Vehicle Control）

专知会员服务

32+阅读 · 2019年12月25日