物体探测培训前分析 (An Analysis of Pre-Training on Object Detection)

We provide a detailed analysis of convolutional neural networks which are pre-trained on the task of object detection. To this end, we train detectors on large datasets like OpenImagesV4, ImageNet Localization and COCO. We analyze how well their features generalize to tasks like image classification, semantic segmentation and object detection on small datasets like PASCAL-VOC, Caltech-256, SUN-397, Flowers-102 etc. Some important conclusions from our analysis are --- 1) Pre-training on large detection datasets is crucial for fine-tuning on small detection datasets, especially when precise localization is needed. For example, we obtain 81.1% mAP on the PASCAL-VOC dataset at 0.7 IoU after pre-training on OpenImagesV4, which is 7.6% better than the recently proposed DeformableConvNetsV2 which uses ImageNet pre-training. 2) Detection pre-training also benefits other localization tasks like semantic segmentation but adversely affects image classification. 3) Features for images (like avg. pooled Conv5) which are similar in the object detection feature space are likely to be similar in the image classification feature space but the converse is not true. 4) Visualization of features reveals that detection neurons have activations over an entire object, while activations for classification networks typically focus on parts. Therefore, detection networks are poor at classification when multiple instances are present in an image or when an instance only covers a small fraction of an image.

翻译：我们详细分析在物体探测任务上受过预先训练的进化神经网络。为此, 我们训练大型数据集的探测器, 如 OpenImagesV4、图像网络本地化和COCO。我们分析它们的特点如何概括到图像分类、语义分解和小数据集(如PaSCAL-VOC、 Caltech-256、 SUN-397、 Flowers-102等)的物体探测等任务。我们分析的一些重要结论是 -- -- 1 1) 大型探测数据集的预先训练对于小型探测数据集的微调至关重要,特别是当需要精确的本地化时。例如,我们在OpenImagsV4 的预训练前训练后,在PaSCAL-VOC数据集上获得81.1%的 mAP,在OpenIMagesV4, 这比最近提出的使用图像网络前训练前的变形ConvsNet2要好7.6%。 2) 检测前的预先训练也有利于其他本地化任务, 如精细的分类, 但会影响图像分类。 3) 图像图象的特征(如 a aaving. gredistration cregidustration creal res recilate) distration distration credududududustration) 的图像网络上, 的图像是相似的缩缩缩缩缩缩缩缩缩成像的特征。