向量化在线POMDP规划 (Vectorized Online POMDP Planning)

Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization of today's hardware, but parallelizing POMDP solvers has been challenging. They rely on interleaving numerical optimization over actions with the estimation of their values, which creates dependencies and synchronization bottlenecks between parallel processes that can quickly offset the benefits of parallelization. In this paper, we propose Vectorized Online POMDP Planner (VOPP), a novel parallel online solver that leverages a recent POMDP formulation that analytically solves part of the optimization component, leaving only the estimation of expectations for numerical computation. VOPP represents all data structures related to planning as a collection of tensors and implements all planning steps as fully vectorized computations over this representation. The result is a massively parallel solver with no dependencies and synchronization bottlenecks between parallel computations. Experimental results indicate that VOPP is at least 20X more efficient in computing near-optimal solutions compared to an existing state-of-the-art parallel online solver.

翻译：部分可观测性下的规划是自主机器人的核心能力。部分可观测马尔可夫决策过程（POMDP）为部分可观测环境下的规划问题提供了强大框架，能够刻画动作的随机效应以及通过噪声观测获得的有限信息。POMDP求解过程若能充分利用现代硬件的大规模并行计算能力将获得巨大收益，但现有POMDP求解器的并行化始终面临挑战。这些求解器依赖于动作数值优化与其价值评估的交替执行，导致并行进程间产生依赖关系与同步瓶颈，从而迅速抵消并行化带来的优势。本文提出向量化在线POMDP规划器（VOPP），这是一种新型并行在线求解器，其基于近期提出的POMDP建模方法——该方法通过解析方式完成部分优化计算，仅保留期望值的评估环节进行数值运算。VOPP将所有与规划相关的数据结构表示为张量集合，并基于该表示将所有规划步骤实现为完全向量化的计算过程。由此构建的大规模并行求解器彻底消除了并行计算间的依赖关系与同步瓶颈。实验结果表明，与现有最先进的并行在线求解器相比，VOPP在计算近似最优解时的效率提升至少达20倍。