Behaviour cloning is a commonly used strategy for imitation learning and can be extremely effective in constrained domains. However, in cases where the dynamics of an environment may be state dependent and varying, behaviour cloning places a burden on model capacity and the number of demonstrations required. This paper introduces switching density networks, which rely on a categorical reparametrisation for hybrid system identification. This results in a network comprising a classification layer that is followed by a regression layer. We use switching density networks to predict the parameters of hybrid control laws, which are toggled by a switching layer to produce different controller outputs, when conditioned on an input state. This work shows how switching density networks can be used for hybrid system identification in a variety of tasks, successfully identifying the key joint angle goals that make up manipulation tasks, while simultaneously learning image-based goal classifiers and regression networks that predict joint angles from images. We also show that they can cluster the phase space of an inverted pendulum, identifying the balance, spin and pump controllers required to solve this task. Switching density networks can be difficult to train, but we introduce a cross entropy regularisation loss that stabilises training.
翻译:行为克隆是一种常用的模仿学习策略,在受限制的领域可以极为有效。但是,如果环境动态取决于状态和差异,行为克隆会给模型容量和所需的演示次数带来负担。本文介绍转换密度网络,这些网络依靠绝对的重新校正来进行混合系统识别。这导致一个由分类层组成的网络,然后是回归层。我们使用转换密度网络来预测混合控制法的参数,这些参数通过转换层来生成不同的控制器输出,如果以输入状态为条件。这项工作表明,转换密度网络如何用于混合系统识别各种任务,成功地确定构成操纵任务的关键联合角度目标,同时学习基于图像的目标分类和回归网络,从图像中预测共同角度。我们还表明,它们可以将倒转的曲曲曲子的阶段空间集中起来,确定解决这项任务所需的平衡、旋转和泵控制器。转换密度网络可能很难被训练,但我们引入了一种跨子固定化损失。