学习没有人类演示的远程行动 (Learning Latent Actions without Human Demonstrations)

We can make it easier for disabled users to control assistive robots by mapping the user's low-dimensional joystick inputs to high-dimensional, complex actions. Prior works learn these mappings from human demonstrations: a non-disabled human either teleoperates or kinesthetically guides the robot arm through a variety of motions, and the robot learns to reproduce the demonstrated behaviors. But this framework is often impractical -- disabled users will not always have access to external demonstrations! Here we instead learn diverse teleoperation mappings without either human demonstrations or pre-defined tasks. Under our unsupervised approach the robot first optimizes for object state entropy: i.e., the robot autonomously learns to push, pull, open, close, or otherwise change the state of nearby objects. We then embed these diverse, object-oriented behaviors into a latent space for real-time control: now pressing the joystick causes the robot to perform dexterous motions like pushing or opening. We experimentally show that -- with a best-case human operator -- our unsupervised approach actually outperforms the teleoperation mappings learned from human demonstrations, particularly if those demonstrations are noisy or imperfect. But user study results are less clear-cut: although our approach enables participants to complete tasks with multiple objects more quickly, the unsupervised mapping also learns motions that the human does not need, and these additional behaviors may confuse the human. Videos of the user study: https://youtu.be/BkqHQjsUKDg

翻译：我们可以让残疾用户更容易地控制辅助机器人。我们可以让残疾用户更容易地控制辅助机器人, 方法是绘制用户低维的游戏杆输入到高维、复杂的动作中。先前的作品从人类演示中学习这些绘图: 一个非残疾的人类或远程操作或感官操作, 通过各种动作引导机器人手臂。而机器人则学会复制所显示的行为。但是这个框架通常不切实际 -- 残疾用户不会总是有机会使用外部演示! 我们在这里学习多种远程操作绘图, 而没有人类演示或预设的任务。在我们的未受监督的操作方法下, 机器人首先优化对象状态的 : 即机器人自主地学习推动、拉动、开启、关闭或以其他方式改变附近物体的状态。我们然后将这些多样化的、面向对象的行为嵌入一个潜在的空间, 以便实时控制。现在按下游戏杆让机器人执行像推动或打开那样的动作。我们实验性地显示 -- 以最能的人类操作者 -- 我们的未超度方法实际上超越了从人类演示中学会的远程绘图。 Q Q,, 特别是如果这些演示的模化的用户动作能让参与者学习更不精确的动作。