Deep visuomotor policy learning achieves promising results in control tasks such as robotic manipulation and autonomous driving, where the action is generated from the visual input by the neural policy. However, it requires a huge number of online interactions with the training environment, which limits its real-world application. Compared to the popular unsupervised feature learning for visual recognition, feature pretraining for visuomotor control tasks is much less explored. In this work, we aim to pretrain policy representations for driving tasks using hours-long uncurated YouTube videos. A new contrastive policy pretraining method is developed to learn action-conditioned features from video frames with action pseudo labels. Experiments show that the resulting action-conditioned features bring substantial improvements to the downstream reinforcement learning and imitation learning tasks, outperforming the weights pretrained from previous unsupervised learning methods. Code and models will be made publicly available.
翻译:深相对摩托政策学习在控制任务(如机器人操纵和自主驾驶)方面取得有希望的成果,而控制任务(如机器人操纵和自主驾驶)的行动是由神经政策视觉投入产生的。然而,它需要大量与培训环境的在线互动,这限制了培训环境的实际应用。与普通的未经监督的特征学习相比,用于视觉识别的特征学习,对用于相对摩托控制任务的特质培训远没有那么深入探讨。在这项工作中,我们的目标是利用未经加工的YouTube视频对驾驶任务的政策表述进行预先培训。我们开发了一个新的对比性政策前培训方法,从带有动作假标签的视频框中学习以行动为条件的特征。实验显示,由此产生的有行动条件的特征给下游强化学习和模仿学习任务带来了重大改进,超过了先前未经监督的学习方法所预先训练的重量。代码和模型将被公开使用。