This paper presents a scalable and fault-tolerant framework for unmanned aerial vehicle (UAV) mission management in complex and uncertain environments. The proposed approach addresses the computational bottleneck inherent in solving large-scale Markov Decision Processes (MDPs) by introducing a two-stage decomposition strategy. In the first stage, a factor-based algorithm partitions the global MDP into smaller, goal-specific sub-MDPs by leveraging domain-specific features such as goal priority, fault states, spatial layout, and energy constraints. In the second stage, a priority-based recombination algorithm solves each sub-MDP independently and integrates the results into a unified global policy using a meta-policy for conflict resolution. Importantly, we present a theoretical analysis showing that, under mild probabilistic independence assumptions, the combined policy is provably equivalent to the optimal global MDP policy. Our work advances artificial intelligence (AI) decision scalability by decomposing large MDPs into tractable subproblems with provable global equivalence. The proposed decomposition framework enhances the scalability of Markov Decision Processes, a cornerstone of sequential decision-making in artificial intelligence, enabling real-time policy updates for complex mission environments. Extensive simulations validate the effectiveness of our method, demonstrating orders-of-magnitude reduction in computation time without sacrificing mission reliability or policy optimality. The proposed framework establishes a practical and robust foundation for scalable decision-making in real-time UAV mission execution.
翻译:本文提出了一种面向复杂不确定环境中无人机任务管理的可扩展且容错的框架。该方法通过引入两阶段分解策略,解决了求解大规模马尔可夫决策过程(MDP)固有的计算瓶颈。在第一阶段,一种基于因子的算法利用目标优先级、故障状态、空间布局和能量约束等特定领域特征,将全局MDP划分为更小、面向特定目标的子MDP。在第二阶段,一种基于优先级的重组算法独立求解每个子MDP,并利用元策略进行冲突消解,将结果整合为统一的全局策略。重要的是,我们提供了理论分析,表明在温和的概率独立性假设下,组合策略可证明等价于最优全局MDP策略。本研究通过将大规模MDP分解为具有可证明全局等价性的可处理子问题,推动了人工智能决策的可扩展性。所提出的分解框架增强了马尔可夫决策过程的可扩展性——这是人工智能中序贯决策的基石——从而支持复杂任务环境中的实时策略更新。大量仿真验证了本方法的有效性,表明在计算时间上实现了数量级的减少,同时未牺牲任务可靠性或策略最优性。该框架为实时无人机任务执行中的可扩展决策奠定了实用且鲁棒的基础。