基于双时间尺度优化的多机器人策略自适应与一致性

项目名称： 基于双时间尺度优化的多机器人策略自适应与一致性

项目编号： No.61473316

项目类型： 面上项目

立项/批准年度： 2015

项目学科： 自动化技术、计算机技术

项目作者： 陈鑫

作者单位： 中国地质大学（武汉）

项目金额： 82万元

中文摘要： 具有探索未知环境和自学习合作行为的能力是实现智能多机器人系统的关键之一，但系统分布式特点和个体的动力学特性使合作行为自学习存在计算复杂度高、泛化困难和工程适用性差等问题。本项目针对这些问题研究基于Similar-POMDP的双时间尺度多机器人优化架构，将多机器人行为优化分解为互为依赖的时变拓扑下一致性最优控制和合作策略优化,从而降低策略学习的空间复杂度；研究保证拓扑连通的分布式多机器人保性能一致性控制方法，保证策略的可实现性；结合一致性性能评价和面向一般指标的逼近动态规划优化方法，设计基于非参数评价器的合作策略优化算法，实现非建模条件下多机器人合作策略的有效泛化和自适应优化；结合图分解和多智能体协同学习，研究合作策略的分布式优化方法，提高模型的工程适用性。项目将从机制上实现分布式策略优化与一致性控制的协同工作，为智能多机器人系统的实现提供解决方案，具有重要理论意义和应用价值。

中文关键词： 多机器人系统；双时间尺度优化；策略自适应；一致性；类部分可观测Markov决策

英文摘要： The abilities of searching unknown environments and learning cooperative policies on line are viewed as the keys to realize intelligent multi-robot systems. However, the decentralized implementation and the complex dynamics of individuals induce high computation complexity, difficult generalization, and poor applicability in practice. Aiming at these problems, the project studies double-time-scale cooperative optimization framework based on Similar-POMDP, in which the multi-robot behavior optimization is reached by the coordination between the cooperative strategy optimization and the consensus optimization control under time-varying topologies. Thus the space complexity of policy learning is reduced significantly. To make the cooperative policies feasible during persistent policy optimization, the distributed consensus protocol using guaranteed cost control is studied in order to ensure topology connected. Then based on the evaluation method for the performance of consensus and the approximate dynamic programming (ADP) for general optimization index, the cooperative policy optimization algorithm with non-parametric critic module is developed, in order to realize efficient generalization and adaptive optimization for cooperative policy under the unknown and unmodeled environments. Using the directed graph decomposition and the multi-agent coordinated learning, the project investigates the decentralized way to realize such cooperative policy optimization, in order to improve feasibility in practice. The project will finally achieve the mechanism of simultaneous cooperative strategy optimization and consensus control. It will serve as the solution to reach intelligence of multiple robot systems under complex environment. The research is of great theoretical significance and application prospects.

英文关键词： Multi-Robot Systems;Double-Time-Scale Optimization;Policy Adaptation;Consensus;Similar-POMDP

成为VIP会员查看完整内容