Nano drones are uniquely equipped for fully autonomous applications due to their agility, low cost, and small size. However, their constrained form factor limits flight time, sensor payload, and compute capability, which poses a significant limitation on the use of source-seeking nano drones in GPS-denied and highly cluttered environments. The primary goal of our work is to demonstrate the effectiveness of deep reinforcement learning in fully autonomous navigation on highly constrained, general-purpose hardware and present a methodology for future applications. To this end, we present a deep reinforcement learning-based light seeking policy that executes, in conjunction with the flight control stack, on a commercially available off-the-shelf ultra-low-power microcontroller (MCU). We describe our methodology for training and executing deep reinforcement learning policies for deployment on constrained, general-purpose MCUs. By carefully designing the network input, we feed features relevant to the agent in finding the source, while reducing computational cost and enabling inference up to 100 Hz. We verify our approach with simulation and in-field testing on a Bitcraze CrazyFlie, achieving 94% success rate in a highly cluttered and randomized test environment. The policy demonstrates efficient light seeking by reaching the goal in simulation in 65 % fewer steps and with 60% shorter paths, compared to a baseline `roomba' algorithm.
翻译:纳米无人驾驶飞机由于其灵活性、低成本和规模小,拥有完全自主应用的独特设备。然而,其受限因素限制了飞行时间、传感器有效载荷和计算能力,严重限制了在GPS封闭和高度混乱的环境中使用寻找源的纳米无人驾驶飞机。我们工作的首要目标是展示在高度受限、通用硬件上完全自主导航的深强化学习的有效性,并为未来应用提供一种方法。为此,我们提出了一个深强化学习光源搜索政策,该政策与飞行控制堆一起,在商业上可买到的超低功率微控制器(MCU)之外执行。我们描述了我们在有限、通用 MCUS上部署的培训和实施深度强化学习政策的方法。通过仔细设计网络输入,我们向代理人提供查找源所需的相关特征,同时降低计算成本,使推断能力达到100赫兹。我们通过模拟和现场测试比特克拉·古斯塔弗利,在高度封闭式的超低能微控制台(MCUU)上实现94%的成功率,在高度封闭和低速的测试环境中,通过低速的测试率模拟和低速测试环境,在精准测试中实现65级测试。