基于合成数据的无人机图像中斑马的检测 (Synthetic Data-based Detection of Zebras in Drone Imagery)

Datasets that allow the training of common objects or human detectors are widely available. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. Likewise, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information as human shapes, are hardly available. To overcome this, usage of synthetic data generation with realistic rendering technologies has recently gained traction and advanced tasks like target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at https://keeper.mpdl.mpg.de/d/12abb3bb6b12491480d5/.

翻译：数据集可以用于常见物品或人物探测器的训练。这些数据集采用带标签的实际图像形式，需要大量人力，可能会出现错误，如缺少标签，或者在受限的情况下，如VICON系统。同样，不常见的场景，如空中视图，野生斑马等难以获取的信息，如人体形状，通常难以获得。为了克服这一点，使用合成数据生成和逼真的渲染技术最近越来越受到青睐，并推动了目标跟踪和人体姿势估计等先进任务。然而，野生动物等课题在此类数据集中仍然通常无法得到很好的代表。在这项工作中，我们首先展示了预先训练的YOLO检测器无法在受限数据集上识别出从空中视角记录的真实图像中的斑马。为解决这个问题，我们提出了一种使用合成数据进行动物检测训练的方法。我们首先使用GRADE生成一组新的合成斑马数据集，这是一种最先进的数据生成框架，包括每个主体的RGB、深度、骨骼关节位置、姿势、形状和实例分割。我们使用这个数据集从头开始训练一个YOLO检测器。通过对货币实验室提供的i）互联网上有限的数据集和ii）由我们收集和手动标记的新数据集进行我们模型的广泛评估，我们表明我们可以使用只在训练期间使用合成数据来检测斑马。代码、结果、训练模型以及生成和训练数据都提供了开源在https://keeper.mpdl.mpg.de/d/12abb3bb6b12491480d5/。