While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training. To eliminate expensive and exhaustive annotation, we study self-supervised learning for visual tracking. In this work, we develop the Crop-Transform-Paste operation, which is able to synthesize sufficient training data by simulating various appearance variations during tracking, including appearance variations of objects and background interference. Since the target state is known in all synthesized data, existing deep trackers can be trained in routine ways using the synthesized data without human annotation. The proposed target-aware data-synthesis method adapts existing tracking approaches within a self-supervised learning framework without algorithmic changes. Thus, the proposed self-supervised learning mechanism can be seamlessly integrated into existing tracking frameworks to perform training. Extensive experiments show that our method 1) achieves favorable performance against supervised learning schemes under the cases with limited annotations; 2) helps deal with various tracking challenges such as object deformation, occlusion, or background clutter due to its manipulability; 3) performs favorably against state-of-the-art unsupervised tracking methods; 4) boosts the performance of various state-of-the-art supervised learning frameworks, including SiamRPN++, DiMP, and TransT.
翻译:虽然以深层次学习为基础的追踪方法取得了长足进展,但它们需要大规模和高质量的附加说明数据,以进行充分的培训。为了消除昂贵和详尽的说明,我们研究自我监督的学习方法,以便进行视觉跟踪。在这项工作中,我们开发了作物-变异式-帕斯特操作,通过模拟跟踪过程中的各种外观变化,包括物体的外观变化和背景干扰,能够综合足够的培训数据。由于所有综合数据中都了解目标国,现有的深层次跟踪者可以在没有人类说明的情况下,以常规方式使用综合数据进行培训。拟议的有目标的数据合成方法在不受算法变化的情况下,在自监督的学习框架内调整现有的追踪方法。因此,拟议的自我监督学习机制可以顺利地纳入现有的跟踪框架,以进行培训。广泛的实验表明,我们的方法1 (a) 相对于受监督的、说明有限的情况下的学习计划,取得了有利的业绩;(2) 帮助应对各种追踪挑战,如物体变形、隐蔽或背景因其可操作性而模糊;(3) 优劣的Si-roduction-traction-traced the disal-travely-travely-travely 4;和Drvical-lax-lax-d-d-lax-lax-d-d-d-d-tracal-