TAPNext：将任意点追踪（TAP）作为下一令牌预测任务 (TAPNext: Tracking Any Point (TAP) as Next Token Prediction)

Tracking Any Point (TAP) in a video is a challenging computer vision problem with many demonstrated applications in robotics, video editing, and 3D reconstruction. Existing methods for TAP rely heavily on complex tracking-specific inductive biases and heuristics, limiting their generality and potential for scaling. To address these challenges, we present TAPNext, a new approach that casts TAP as sequential masked token decoding. Our model is causal, tracks in a purely online fashion, and removes tracking-specific inductive biases. This enables TAPNext to run with minimal latency, and removes the temporal windowing required by many existing state of art trackers. Despite its simplicity, TAPNext achieves a new state-of-the-art tracking performance among both online and offline trackers. Finally, we present evidence that many widely used tracking heuristics emerge naturally in TAPNext through end-to-end training. The TAPNext model and code can be found at https://tap-next.github.io/.

翻译：视频中的任意点追踪（TAP）是一个具有挑战性的计算机视觉问题，在机器人学、视频编辑和三维重建等领域已展现出诸多应用前景。现有的TAP方法严重依赖于复杂的、针对追踪任务设计的归纳偏置和启发式规则，这限制了其通用性和规模化潜力。为应对这些挑战，我们提出了TAPNext，一种将TAP任务转化为序列化掩码令牌解码的新方法。我们的模型具有因果性，以纯在线方式进行追踪，并消除了针对追踪任务的特定归纳偏置。这使得TAPNext能够以极低的延迟运行，并移除了当前许多先进追踪器所需的时间窗口机制。尽管方法简洁，TAPNext在在线与离线追踪器中均实现了新的最先进追踪性能。最后，我们通过实验证明，许多广泛使用的追踪启发式规则能够通过端到端训练在TAPNext中自然涌现。TAPNext模型与代码可在 https://tap-next.github.io/ 获取。

相关内容

TAP

关注 817

ACM应用感知TAP(ACM Transactions on Applied Perception)旨在通过发表有助于统一这些领域研究的高质量论文来增强计算机科学与心理学/感知之间的协同作用。该期刊发表跨学科研究，在跨计算机科学和感知心理学的任何主题领域都具有重大而持久的价值。所有论文都必须包含感知和计算机科学两个部分。主题包括但不限于：视觉感知：计算机图形学，科学/数据/信息可视化，数字成像，计算机视觉，立体和3D显示技术。听觉感知：听觉显示和界面，听觉听觉编码，空间声音，语音合成和识别。触觉：触觉渲染，触觉输入和感知。感觉运动知觉：手势输入，身体运动输入。感官感知：感官整合，多模式渲染和交互。官网地址：http://dblp.uni-trier.de/db/journals/tap/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日