In recent years, the multiple-stage strategy has become a popular trend for visual tracking. This strategy first utilizes a base tracker to coarsely locate the target and then exploits a refinement module to obtain more accurate results. However, existing refinement modules suffer from the limited transferability and precision. In this work, we propose a novel, flexible and accurate refinement module called Alpha-Refine, which exploits a precise pixel-wise correlation layer together with a spatial-aware non-local layer to fuse features and can predict three complementary outputs: bounding box, corners and mask. To wisely choose the most adequate output, we also design a light-weight branch selector module. We apply the proposed Alpha-Refine module to five famous and state-of-the-art base trackers: DiMP, ATOM, SiamRPN++, RTMDNet and ECO. The comprehensive experiments on TrackingNet, LaSOT and VOT2018 benchmarks demonstrate that our approach significantly improves the tracking performance in comparison with other existing refinement methods. The source codes will be available at https://github.com/MasterBin-IIAU/AlphaRefine.
翻译:近年来,多阶段战略已成为视觉跟踪的流行趋势。 这一战略首先利用一个基础跟踪器粗略地定位目标,然后利用一个精细的模块获取更准确的结果。然而,现有的精细模块因可转移性和精确性有限而受到影响。在这项工作中,我们提议了一个名为阿尔法-Refine的新颖、灵活和准确的精细化模块,该模块将利用精确的像素联系层,同时利用一个空间认知的非本地层来引信功能,并可以预测三个互补产出:捆绑框、角和面罩。为了明智地选择最适当的产出,我们还设计了一个轻量的分支选择模块。我们把拟议的阿尔法-Refine模块应用到五个著名和最先进的基本跟踪器:Dimp、ATOM、SimRPN++、RTMDNet和ECO。关于跟踪网络、LaSOT和VOT2018基准的全面实验表明,我们的方法大大改进了与其他现有改进方法的跟踪性能。源码将在https://github.com/MasterBin-II/AL-Refine上公布。