Sample efficiency is one of the key factors when applying policy search to real-world problems. In recent years, Bayesian Optimization (BO) has become prominent in the field of robotics due to its sample efficiency and little prior knowledge needed. However, one drawback of BO is its poor performance on high-dimensional search spaces as it focuses on global search. In the policy search setting, local optimization is typically sufficient as initial policies are often available, e.g., via meta-learning, kinesthetic demonstrations or sim-to-real approaches. In this paper, we propose to constrain the policy search space to a sublevel-set of the Bayesian surrogate model's predictive uncertainty. This simple yet effective way of constraining the policy update enables BO to scale to high-dimensional spaces (>100) as well as reduces the risk of damaging the system. We demonstrate the effectiveness of our approach on a wide range of problems, including a motor skills task, adapting deep RL agents to new reward signals and a sim-to-real task for an inverted pendulum system.
翻译:抽样效率是将政策搜索应用于现实世界问题的关键因素之一。近年来,Bayesian优化(BO)由于其抽样效率以及以前所需的知识很少,在机器人领域变得显着。然而,BO的一个缺点是,它在以全球搜索为重点的高维搜索空间表现不佳。在政策搜索环境中,地方优化一般是足够的,因为最初的政策往往可以利用,例如,通过元学习、传教示范或模拟到现实的方法。在本文中,我们提议将政策搜索空间限制在Bayesian代孕模型的子层的预测不确定性中。这种简单而有效的限制政策更新的方法使得BO能够向高维空间(>100)进行推广,并减少破坏系统的风险。我们展示了我们处理广泛问题的方法的有效性,包括机动技能任务,使深RL剂适应新的奖励信号,以及倒置式支架系统的一个模拟到现实的任务。