广泛探索、地方政策树,用于长期-霍里桑任务规划 (Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning)

Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces with complex dynamics. Classical motion planning algorithms, such as rapidly-exploring random trees, are capable of efficiently exploring large state spaces and computing long-horizon, sequential plans. However, these algorithms are generally challenged with complex, stochastic, and high-dimensional state spaces as well as in the presence of narrow passages, which naturally emerge in tasks that interact with the environment. Machine learning offers a promising solution for its ability to learn general policies that can handle complex interactions and high-dimensional observations. However, these policies are generally limited in horizon length. Our approach, Broadly-Exploring, Local-policy Trees (BELT), merges these two approaches to leverage the strengths of both through a task-conditioned, model-based tree search. BELT uses an RRT-inspired tree search to efficiently explore the state space. Locally, the exploration is guided by a task-conditioned, learned policy capable of performing general short-horizon tasks. This task space can be quite general and abstract; its only requirements are to be sampleable and to well-cover the space of useful tasks. This search is aided by a task-conditioned model that temporally extends dynamics propagation to allow long-horizon search and sequential reasoning over tasks. BELT is demonstrated experimentally to be able to plan long-horizon, sequential trajectories with a goal conditioned policy and generate plans that are robust.

翻译：现实环境中的长方位规划要求有能力在具有复杂动态的高度国家空间对相继任务进行思考。典型的运动规划算法,如快速探索随机树等,能够有效地探索大型国家空间和计算长方位和相继计划。然而,这些算法通常会遇到复杂、随机和高方位国家空间的挑战,以及存在在与环境互动的任务中自然出现的狭窄的树道。机器学习为它学习能够处理复杂互动和高方位观测的一般政策提供了有希望的解决方案。然而,这些政策一般在地平线长度上是有限的。我们的方法,即宽度探索、本地政策树(BELT),将这两种方法结合起来,通过基于模型的树搜索来利用两者的优势。 BELT使用由RRT启发的树道搜索来有效探索状态空间。本地的探索以任务为导向的、学习的、能够处理复杂互动和高方位观测的一般政策为方向。任务空间空间空间是相当一般和抽象的、抽象和抽象的。我们的方法, 宽度的、宽度、宽度、宽度、宽度的、宽度的、本地的、跨度的、跨度的、跨度的搜索是实验性搜索任务,这是一个实验性、跨度的、跨度的搜索、跨度的、跨度的、跨度的任务。