对不确定的非线性系统采用实时计量-强化学习控制办法</s> (Real-Time Measurement-Driven Reinforcement Learning Control Approach for Uncertain Nonlinear Systems)

The paper introduces an interactive machine learning mechanism to process the measurements of an uncertain, nonlinear dynamic process and hence advise an actuation strategy in real-time. For concept demonstration, a trajectory-following optimization problem of a Kinova robotic arm is solved using an integral reinforcement learning approach with guaranteed stability for slowly varying dynamics. The solution is implemented using a model-free value iteration process to solve the integral temporal difference equations of the problem. The performance of the proposed technique is benchmarked against that of another model-free high-order approach and is validated for dynamic payload and disturbances. Unlike its benchmark, the proposed adaptive strategy is capable of handling extreme process variations. This is experimentally demonstrated by introducing static and time-varying payloads close to the rated maximum payload capacity of the manipulator arm. The comparison algorithm exhibited up to a seven-fold percent overshoot compared to the proposed integral reinforcement learning solution. The robustness of the algorithm is further validated by disturbing the real-time adapted strategy gains with a white noise of a standard deviation as high as 5%.

翻译：本文引入了一个互动的机器学习机制, 用于处理不确定的非线性动态过程的测量, 并因此为实时的启动策略提供建议。对于概念演示, 基诺瓦机器人臂的轨迹跟踪优化问题通过综合强化学习方法解决, 保证缓慢变化动态的稳定。解决方案的实施使用一个无模型的滚动程序, 以解决问题的整体时间差异方程式。拟议技术的性能以另一种无模型的高档方法为基准, 并被验证为动态有效载荷和扰动。与其基准不同, 拟议的适应战略能够处理极端的进程变异。这是通过引入接近操纵器臂最高定值有效载荷能力的静态和时间折叠式有效载荷的实验性证明。比较算法比拟议的综合强化学习解决方案高出7倍。算法的稳健性得到了进一步验证, 因为它以高达5%的标准偏差的白色噪音扰乱了实时调整的战略成果。</s>