We present FEAR, a novel, fast, efficient, accurate, and robust Siamese visual tracker. We introduce an architecture block for object model adaption, called dual-template representation, and a pixel-wise fusion block to achieve extra flexibility and efficiency of the model. The dual-template module incorporates temporal information with only a single learnable parameter, while the pixel-wise fusion block encodes more discriminative features with fewer parameters compared to standard correlation modules. By plugging-in sophisticated backbones with the novel modules, FEAR-M and FEAR-L trackers surpass most Siamesetrackers on several academic benchmarks in both accuracy and efficiencies. Employed with the lightweight backbone, the optimized version FEAR-XS offers more than 10 times faster tracking than current Siamese trackers while maintaining near state-of-the-art results. FEAR-XS tracker is 2.4x smaller and 4.3x faster than LightTrack [62] with superior accuracy. In addition, we expand the definition of the model efficiency by introducing a benchmark on energy consumption and execution speed. Source code, pre-trained models, and evaluation protocol will be made available upon request
翻译:我们展示了一个新颖的、快速的、高效的、准确的和强大的Siamse视觉跟踪器。 我们为对象模型调整引入了一个结构块,称为双板代表器,以及一个像素的融合块,以实现该模型的额外灵活性和效率。 双板模块包含时间信息,只有单一的可学习参数,而像素的融合块则比标准相关模块的参数要少一些,具有更具有歧视性的特点。 通过与新模块连接精密的骨干,FEAR-M和FEAR-L跟踪器在精度和效率两个学术基准上都超过大多数Siamsese跟踪器。 使用轻质骨架,优化的FEAR-XS版本提供了比当前Siamese跟踪器更快的10倍多的跟踪,同时保持接近最新水平的结果。 FEAR-XS跟踪器比LightTrach [62]要小2.4倍和4.3x快。 此外,我们通过引入能源消耗和执行速度的基准来扩大模型的效率定义。 源码、 培训前模型和协议将按请求进行评估。