用于探测物体的中央地物金字塔 (Centralized Feature Pyramid for Object Detection)

Visual feature pyramid has shown its superiority in both effectiveness and efficiency in a wide range of applications. However, the existing methods exorbitantly concentrate on the inter-layer feature interactions but ignore the intra-layer feature regulations, which are empirically proved beneficial. Although some methods try to learn a compact intra-layer feature representation with the help of the attention mechanism or the vision transformer, they ignore the neglected corner regions that are important for dense prediction tasks. To address this problem, in this paper, we propose a Centralized Feature Pyramid (CFP) for object detection, which is based on a globally explicit centralized feature regulation. Specifically, we first propose a spatial explicit visual center scheme, where a lightweight MLP is used to capture the globally long-range dependencies and a parallel learnable visual center mechanism is used to capture the local corner regions of the input images. Based on this, we then propose a globally centralized regulation for the commonly-used feature pyramid in a top-down fashion, where the explicit visual center information obtained from the deepest intra-layer feature is used to regulate frontal shallow features. Compared to the existing feature pyramids, CFP not only has the ability to capture the global long-range dependencies, but also efficiently obtain an all-round yet discriminative feature representation. Experimental results on the challenging MS-COCO validate that our proposed CFP can achieve the consistent performance gains on the state-of-the-art YOLOv5 and YOLOX object detection baselines.

翻译：视觉特征金字塔在广泛的应用中显示出其在效力和效率方面的优越性。然而,现有方法在广泛应用中显示其在效力和效率方面的优越性。然而,现有方法高度集中于两层之间的特征互动,却忽视了从经验上证明有益的内部特征条例。虽然有些方法试图在关注机制或视觉变异器的帮助下学习一个紧凑的层内特征说明,但它们忽视了对密集预测任务十分重要的被忽视的角区域。为了解决这一问题,我们在本文件中提议以全球明确的中央特征管理为基础,对物体探测采用中央特异性金字塔(CFP) 。具体地说,我们首先提出一个空间清晰的视觉中心方案,其中使用轻量的 MLP 来捕捉全球远程依赖,并使用一个平行的可学习的视觉中心机制来捕捉输入图像的当地角落区域。在此基础上,我们随后提议以自上而下的方式对常用的特异性金字塔进行全球集中管理,从最深层内部特征中获得的清晰的视觉中心信息用于调节前浅特征。与现有的地貌金字塔相比,CFP-FP-C-P-C-C-C-CFS-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-S-C-C-C-C-S-C-C-C-C-C-C-C-S-S-S-S-S-S-S-S-C-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-