基于支持向量机的复杂连续系统强化学习控制研究

项目名称： 基于支持向量机的复杂连续系统强化学习控制研究

项目编号： No.60804022

项目类型： 青年科学基金项目

立项/批准年度： 2009

项目学科： 建筑科学

项目作者： 王雪松

作者单位： 中国矿业大学

项目金额： 18万元

中文摘要： 针对复杂连续系统的学习控制问题，对强化学习算法性能改进及应用等方面进行研究。把强化学习构建为一个简单的二分类问题，提出基于概率型支持向量机以及高斯过程分类器的Q学习；针对强化学习方法应用于电梯群控系统时面临的维数灾难问题，提出基于抽象状态的贝叶斯强化学习；利用概率型支持向量分类机给支持向量回归机提供实时、动态的知识以促进值函数的学习，提出基于协同支持向量机的Q学习；利用参数模型的丰富学习经验，提出基于半参数支持向量回归模型的强化学习；为避免过多人为因素造成的系统学习性能下降，提出基于图上基函数自动构造的策略迭代强化学习；为有效重复使用过去收集的样本和降低梯度估计方差，提出基于自适应重要采样的离策略Actor-Critic学习；为在Critic评估中平衡数据有效性和计算有效性之间的关系，提出两种增量式Actor-Critic学习；为减小梯度估计方差以及提高算法的学习收敛速度，提出基于参数探索的期望最大化策略搜索。此外，根据国内外与本项目相关内容的发展情况，还对智能优化和支持向量机等进行了研究。基于上述成果，出版专著1部，发表学术论文24篇，被SCI、Ei收录21篇。

中文关键词： 复杂连续系统；强化学习；支持向量机；智能优化

英文摘要： In order to solve the learning control problems of complex continuous systems, performance improvement and applications of reinforcement learning algorithms are researched. The reinforcement learning is constructed as a simple binary-class problem, Q leaning algorithms based on a probability support vector machine and a Gaussian process classifier are proposed respectively. Aiming at the curse of dimensionality problem encountered by reinforcement learning methods for elevator group scheduling systems with large-scale state space, a kind of Bayesian reinforcement learning method based on abstraction states is proposed. A kind of Q learning based on a cooperative support vector machine is proposed by using a probability support vector classification machine supplies a support vector regression machine with dynamic and real-time knowledge to accelerate the learning process of value function. A reinforcement learning algorithm based on a semi-parametric support vector regression model by taking advantage of large amount learning experience provided by parametric model. In order to avoid the learning performance of reinforcement learning system worse, which is caused by too much human factors, policy iteration reinforcement learning methods based on basis functions that are constructed automatically on graph are given. In order to efficiently reuse previously collected samples and to reduce the variance of the estimation of gradient, a kind of off-policy Actor-Critic learning based on an adaptive importance sampling technique is proposed. In order to balance data efficiency and computational efficiency in the evaluation of Critic, two new incremental Actor-Critic algorithms are proposed. In order to reduce the variance of the estimation of gradient and to improve the convergence speed, a kind of expectation-maximization policy search reinforcement learning with parameter-based exploration is proposed. In addition, according to the development trend related to the project, some aspects including intelligent optimization and support vector machine are researched. Based on the above achievements, a monograph and 24 papers were published in which 21 papers are indexed by SCI and Ei.

英文关键词： complex continuous system; reinforcement learning; support vector machine; intelligent optimization

成为VIP会员查看完整内容