We propose a diffusion approximation method to the continuous-state Markov Decision Processes (MDPs) that can be utilized to address autonomous navigation and control in unstructured off-road environments. In contrast to most decision-theoretic planning frameworks that assume fully known state transition models, we design a method that eliminates such a strong assumption that is often extremely difficult to engineer in reality. We first take the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of the value function, we then design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We first validate the proposed method through extensive simulations in $2D$ obstacle avoidance and $2.5D$ terrain navigation problems. The results show that the proposed approach leads to a much superior performance over several baselines. We then develop a system that integrates our decision-making framework with onboard perception and conduct real-world experiments in both cluttered indoor and unstructured outdoor environments. The results from the physical systems further demonstrate the applicability of our method in challenging real-world environments.
翻译:我们建议采用连续状态的Markov决策程序(MDPs)的分散近似法,该方法可用于处理无结构的越野环境中的自主导航和控制。与大多数假设完全已知的状态过渡模式的决策理论规划框架相比,我们设计了一种方法,消除了这种在现实中往往极难实现的强烈假设。我们首先采用泰勒扩展价值功能的第二阶梯。然后,贝尔曼最佳度方程被一个部分差分方程所近似,该方程仅依赖于过渡模式的第一和第二时刻。然后,通过将价值功能的内核表示结合起来,我们设计了一个有效的政策迭代算法,其政策评价步骤可被作为一套有限的支持状态等式的线性系统。我们首先通过广泛模拟2D美元障碍避免和2.5D美元地形导航问题来验证拟议的方法。结果显示,拟议的方法在几个基线上表现优得多。我们随后开发了一种系统,将我们的决策框架与机载概念结合起来,并在封闭的室内和无结构的室环境中进行实体世界实验。我们首先通过广泛模拟,展示了我们的方法室外环境的结果。