This paper proposes a technique which enables a robot to learn a control objective function incrementally from human user's corrections. The human's corrections can be as simple as directional corrections -- corrections that indicate the direction of a control change without indicating its magnitude -- applied at some time instances during the robot's motion. We only assume that each of the human's corrections, regardless of its magnitude, points in a direction that improves the robot's current motion relative to an implicit objective function. The proposed method uses the direction of a correction to update the estimate of the objective function based on a cutting plane technique. We establish the theoretical results to show that this process of incremental correction and update guarantees convergence of the learned objective function to the implicit one. The method is validated by both simulations and two human-robot games, where human players teach a 2-link robot arm and a 6-DoF quadrotor system for motion planning in environments with obstacles.
翻译:本文建议一种技术, 使机器人能够从人类用户的校正中逐渐学习控制目标功能。 人类校正可以像方向校正一样简单, 指向方向校正 -- -- 校正可以显示控制变化的方向而没有说明其程度的校正 -- -- 在机器人运动期间的某个时候应用。 我们仅假设人类的每个校正, 不论大小, 都指向一个方向, 使机器人的当前运动与隐含目标功能相对更佳。 提议的方法使用校正方向来更新基于切割平面技术对目标功能的估计。 我们建立理论结果, 以显示这个渐进校正过程, 并更新保证所学的目标功能与隐含功能的趋同。 这个方法通过模拟和两次人类机器人游戏得到验证, 人类玩家们在其中教授双链机器人臂和 6- DoF Quadrotororor 系统, 在有障碍的环境中进行运动规划。