针对大规模环境下复杂任务的策略搜索强化学习方法研究

项目名称： 针对大规模环境下复杂任务的策略搜索强化学习方法研究

项目编号： No.61502339

项目类型： 青年科学基金项目

立项/批准年度： 2016

项目学科： 其他

项目作者： 赵婷婷

作者单位： 天津科技大学

项目金额： 20万元

中文摘要： 强化学习是机器学习领域中解决连续决策问题的重要学习方法，研究智能体如何在未知环境中做出决策以获得最大累积回报。策略搜索是强化学习领域中解决连续动作空间的决策问题最为灵活有效的方法之一。然而，针对大规模环境下复杂任务，现有策略搜索方法存在以下局限：①受限于手工特征设计，难以显式描述高维复杂状态变量；②受限于针对指定任务的专门策略模型，难以表达复杂任务的策略；③受限目标函数的非凸性，难以寻找全局最优策略解。为了解决上述问题，本课题拟构建一套面向大规模环境下复杂任务的策略搜索强化学习研究方案。具体内容包括：①大规模环境下状态变量的自主表达；②强泛化能力的深度策略模型；③面向全局最优解的引导型策略搜索算法。通过整合上述新技术，提出一套完整的适用于大规模环境下复杂任务的强化学习方案，为实际应用中的智能控制问题提供理论依据与技术指导，并为下一步深入研究打下基础。

中文关键词： 强化学习；策略搜索；状态表示；深度策略模型；引导型样本

英文摘要： Reinforcement learning (RL), which studies how an agent ought to act in an unknown environment so as to maximize the cumulative rewards, is a powerful machine learning paradigm for sequential decision making. The policy search approach is a flexible and powerful reinforcement learning method particularly for control problems with continuous actions spaces...However, previous policy search approaches are problematic to solving complex control tasks in the large-scale environment, the limitations are shown as follows: ① The state representation relies on hand-crafted features, which is limited when the expert knowledge is not enough facing high-dimensional complex states spaces; ② Before application of policy search methods require specified, low-dimensional policy model, which limits the generality of policy facing general complex tasks; ③ By their nature, complex tasks presents a considerable number of local optima, thus a poor local optima might be a big issue. In order to solve the above mentioned problems, we propose a novel policy search framework for complex control tasks in large scale environments. More specifically, we combine the following three new ideas and give a highly practical and efficient policy search framework: ① Constructing deep neural networks for automatic representation of the state directly from the high-dimensional sensory input in large-scale environments; ② Exploring recurrent and deep architectures for complex policy with high generality; ③ Design guided samples for policy search to direct policy learning and avoid poor local optima. ..Finally, we get a novel reinforcement learning architecture for complex tasks in large-scale environments. This research provides key techniques for robot control in the real-world problems，and also sets solid foundation for our further research.

英文关键词： Reinforcement Learning;Policy Search;State Representation;Deep Policy Model;Guided Sample

成为VIP会员查看完整内容