基于视觉主动搜索框架的地理空间探测 (A Visual Active Search Framework for Geospatial Exploration)

Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. A crucial feature of VAS is that each such query is informative about the spatial distribution of target objects beyond what is captured visually (for example, due to spatial correlation). We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.

翻译：许多问题都可以视为在航拍图像的帮助下进行的地理空间搜索，例如检测偷猎活动或人口贩卖。我们采用视觉主动搜索（VAS）框架对这一类问题进行建模，它以广阔区域的图像作为输入，并旨在尽可能多地识别目标对象的实例。它通过有限数量的查询来实现这一点，每个查询都验证给定区域中是否存在目标对象的实例。VAS 的一个关键特征是每个这样的查询都对目标对象的空间分布具有信息熵，而这种分布无法凭借目视识别进行（例如，因为存在空间相关性）。我们为 VAS 提出了一种强化学习方法，该方法利用一组全面注释的搜索任务作为训练数据来学习搜索策略，并将输入图像的特征与主动搜索状态的自然表示相结合。此外，我们提出了领域适应技术，以提高在训练数据无法完全反映 VAS 任务测试分布的情况下决策时的策略。通过在多个卫星图像数据集上进行的广泛实验，我们展示了所提出的方法显著优于几种强基准线方法。将提供代码和数据。