【泡泡一分钟】把目标跟踪看作为在线决策过程--用强化学习在视频流中学习策略(ICCV2017-31)

会员服务 ·

【泡泡一分钟】把目标跟踪看作为在线决策过程--用强化学习在视频流中学习策略(ICCV2017-31)

2018 年 6 月 5 日 泡泡机器人SLAM

每天一分钟，带你读遍机器人顶级会议文章

标题：Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning

作者：James Supancic, III, Deva Ramanan

来源：ICCV 2017 ( IEEE International Conference on Computer Vision )

播音员：糯米

编译：陈诚(34)

欢迎个人转发朋友圈；其他机构或自媒体如需转载，后台留言申请授权

摘要

作者将目标跟踪归纳为一种在线决策的过程。一个目标跟踪器，它需要能够在模糊不清的图像和有限的计算资源下完成跟踪任务。更加关键的是，跟踪端必须在跟丢目标能够自主重新启动，并且能够决定在接下来在图像的哪个区域去寻找目标并且同时更新追踪目标的外观模型。相比经典的启发式模型，作者提出一种基于最优策略学习的方式来将目标跟踪解读为一种局部可观的决策过程。作者通过深度加强学习算法来做学习策略，并且只在目标跟踪丢失的时候做监督式训练。这种稀疏的奖励机制让使作者可以在大量数据中进行相对数倍的快速的学习。很有趣的事情是，作者尝试把网上的所有视频拼接起来，并且当作是单一的视频流。用这个理论上有限操作中又无限长的视频流训练出了一个独一无二的跟踪器并且对它进行了评估。

译者愚见，这种无差别的训练和评估方式会训练出更由普世意义的跟踪器，但同时增加了解释模型的难度。

Abstract

We formulate tracking as an online decision-making process, where a tracking agent must follow an object despite ambiguous image frames and a limited computational budget. Crucially, the agent must decide where to look in the upcoming frames, when to reinitialize because it believes the target has been lost, and when to update its appearance model for the tracked object. Such decisions are typically made heuristically. Instead, we propose to learn an optimal decision-making policy by formulating tracking as a partially observable decision-making process (POMDP). We learn policies with deep reinforcement learning algorithms that need supervision (a reward signal) only when the track has gone awry. We demonstrate that sparse rewards allow us to quickly train on massive datasets, several orders of magnitude more than past work. Interestingly, by treating the data source of Internet videos as unlimited streams, we both learn and evaluate our trackers in a single, unified computational stream.

如果你对本文感兴趣，想要下载完整文章进行阅读，可以关注【泡泡机器人SLAM】公众号（paopaorobot_slam）。

欢迎来到泡泡论坛，这里有大牛为你解答关于SLAM的任何疑惑。

有想问的问题，或者想刷帖回答问题，泡泡论坛欢迎你！

泡泡网站：www.paopaorobot.org

泡泡论坛：http://paopaorobot.org/forums/