Image-goal navigation (ImageNav) tasks a robot with autonomously exploring an unknown environment and reaching a location that visually matches a given target image. While prior works primarily study ImageNav for ground robots, enabling this capability for autonomous drones is substantially more challenging due to their need for high-frequency feedback control and global localization for stable flight. In this paper, we propose a novel sim-to-real framework that leverages reinforcement learning (RL) to achieve ImageNav for drones. To enhance visual representation ability, our approach trains the vision backbone with auxiliary tasks, including image perturbations and future transition prediction, which results in more effective policy training. The proposed algorithm enables end-to-end ImageNav with direct velocity control, eliminating the need for external localization. Furthermore, we integrate a depth-based safety module for real-time obstacle avoidance, allowing the drone to safely navigate in cluttered environments. Unlike most existing drone navigation methods that focus solely on reference tracking or obstacle avoidance, our framework supports comprehensive navigation behaviors, including autonomous exploration, obstacle avoidance, and image-goal seeking, without requiring explicit global mapping. Code and model checkpoints are available at https://github.com/Zichen-Yan/SIGN.
翻译:图像目标导航任务要求机器人在未知环境中自主探索,并抵达与给定目标图像视觉匹配的位置。先前研究主要关注地面机器人的图像目标导航,而实现自主无人机的这一能力则更具挑战性,因其需要高频反馈控制和全局定位以确保稳定飞行。本文提出一种新颖的仿真到现实框架,利用强化学习实现无人机的图像目标导航。为增强视觉表征能力,该方法通过辅助任务(包括图像扰动和未来状态预测)训练视觉骨干网络,从而提升策略训练效果。所提算法支持端到端的图像目标导航,通过直接速度控制实现,无需依赖外部定位系统。此外,我们集成了基于深度的安全模块以实现实时避障,使无人机能在复杂环境中安全导航。与现有多数仅关注轨迹跟踪或避障的无人机导航方法不同,本框架支持包括自主探索、避障和图像目标寻踪在内的综合导航行为,且无需显式的全局地图构建。代码与模型检查点发布于 https://github.com/Zichen-Yan/SIGN。