通过分离的成比例式综合拉格朗江 (Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian) - 专知论文

会员服务 ·

0

Integration · 分离的 · 控制器 · 学成 · 可约的 ·

2021 年 8 月 26 日

Model-based Chance-Constrained Reinforcement Learning via Separated Proportional-Integral Lagrangian

翻译：通过分离的成比例式综合拉格朗江

Baiyu Peng,Jingliang Duan,Jianyu Chen,Shengbo Eben Li,Genjin Xie,Congsheng Zhang,Yang Guan,Yao Mu,Enxin Sun

Safety is essential for reinforcement learning (RL) applied in the real world. Adding chance constraints (or probabilistic constraints) is a suitable way to enhance RL safety under uncertainty. Existing chance-constrained RL methods like the penalty methods and the Lagrangian methods either exhibit periodic oscillations or learn an over-conservative or unsafe policy. In this paper, we address these shortcomings by proposing a separated proportional-integral Lagrangian (SPIL) algorithm. We first review the constrained policy optimization process from a feedback control perspective, which regards the penalty weight as the control input and the safe probability as the control output. Based on this, the penalty method is formulated as a proportional controller, and the Lagrangian method is formulated as an integral controller. We then unify them and present a proportional-integral Lagrangian method to get both their merits, with an integral separation technique to limit the integral value in a reasonable range. To accelerate training, the gradient of safe probability is computed in a model-based manner. We demonstrate our method can reduce the oscillations and conservatism of RL policy in a car-following simulation. To prove its practicality, we also apply our method to a real-world mobile robot navigation task, where our robot successfully avoids a moving obstacle with highly uncertain or even aggressive behaviors.

翻译：安全是真实世界应用强化学习( RL) 的关键。增加机会限制( 或概率限制) 是在不确定情况下增强RL安全的合适方法。现有的受机会限制的RL方法,如罚款方法和拉格兰吉亚方法,要么显示周期性振动,要么学习过度保守或不安全的政策。在本文中, 我们通过提出一个分离的均衡整体拉格朗吉(SPIL)算法来克服这些缺陷。我们首先从反馈控制角度来审查受限制的政策优化程序, 将惩罚权重视为控制投入, 安全概率视为控制输出。基于这一点, 惩罚法被拟订成一个比例控制器, 而拉格兰吉亚方法则被拟订成一个整体控制器。我们随后统一这些方法, 并展示一个比例性与均衡的拉格朗吉亚方法, 以获得两者的优点, 并采用一个整体的分离技术来限制整体范围内的整体价值。为了加速培训, 安全概率的梯度以基于模型为基础的方式计算。我们演示我们的方法可以降低控制力的比重度和安全概率作为控制输出输出。基于这个方法, 我们演示了它的实际控制方法可以减少振动性, 并用一个真正的机器人模拟, 。

0

相关内容

Integration

Integration：Integration, the VLSI Journal。 Explanation：集成，VLSI杂志。 Publisher：Elsevier。 SIT：http://dblp.uni-trier.de/db/journals/integration/

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ECML-PKDD 2019】基于bagged-trees学习的可解释生存梯度提升模型（Interpretable survival gradient boosting models with bagged trees base learners）

【ECML-PKDD 2019】基于bagged-trees学习的可解释生存梯度提升模型（Interpretable survival gradient boosting models with bagged trees base learners）

专知会员服务

6+阅读 · 2019年12月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Learning Robotic Manipulation Skills Using an Adaptive Force-Impedance Action Space

Arxiv

0+阅读 · 2021年10月20日

A weighted POD-reduction approach for parametrized PDE-constrained Optimal Control Problems with random inputs and applications to environmental sciences

Arxiv

0+阅读 · 2021年10月19日

Improving Robustness of Reinforcement Learning for Power System Control with Adversarial Training

Arxiv

0+阅读 · 2021年10月19日

An active learning approach for improving the performance of equilibrium based chemical simulations

Arxiv

0+阅读 · 2021年10月15日

Adaptive control of a mechatronic system using constrained residual reinforcement learning

Arxiv

1+阅读 · 2021年10月6日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

Constrained-CNN losses forweakly supervised segmentation

Arxiv

5+阅读 · 2018年5月12日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Adaptive strategy for superpixel-based region-growing image segmentation

Arxiv

4+阅读 · 2018年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

深度概率图模型，Deep Probabilistic Models

专知会员服务

29+阅读 · 2021年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【ECML-PKDD 2019】基于bagged-trees学习的可解释生存梯度提升模型（Interpretable survival gradient boosting models with bagged trees base learners）

【ECML-PKDD 2019】基于bagged-trees学习的可解释生存梯度提升模型（Interpretable survival gradient boosting models with bagged trees base learners）

专知会员服务

6+阅读 · 2019年12月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

俄乌战争启示：坦克战与不断演变的战斗形态

《大规模作战行动中与无人机集成的C5ISR系统》

《主观概率约束下寻找可行系统及其军事应用》69页

《美政府问责局：多种挑战影响地面战车任务出勤率》2025最新130页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

spinningup.openai 强化学习资源完整

spinningup.openai 强化学习资源完整

CreateAMind

6+阅读 · 2018年12月17日

Reinforcement Learning: An Introduction 2018第二版 500页

Reinforcement Learning: An Introduction 2018第二版 500页

CreateAMind

14+阅读 · 2018年4月27日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Learning Robotic Manipulation Skills Using an Adaptive Force-Impedance Action Space

Arxiv

0+阅读 · 2021年10月20日

A weighted POD-reduction approach for parametrized PDE-constrained Optimal Control Problems with random inputs and applications to environmental sciences

Arxiv

0+阅读 · 2021年10月19日

Improving Robustness of Reinforcement Learning for Power System Control with Adversarial Training

Arxiv

0+阅读 · 2021年10月19日

An active learning approach for improving the performance of equilibrium based chemical simulations

Arxiv

0+阅读 · 2021年10月15日

Adaptive control of a mechatronic system using constrained residual reinforcement learning

Arxiv

1+阅读 · 2021年10月6日

Density Constrained Reinforcement Learning

Arxiv

6+阅读 · 2021年6月24日

Inverse Constrained Reinforcement Learning

Arxiv

8+阅读 · 2021年5月21日

Constrained-CNN losses forweakly supervised segmentation

Arxiv

5+阅读 · 2018年5月12日

Logically-Constrained Reinforcement Learning

Arxiv

5+阅读 · 2018年4月22日

Adaptive strategy for superpixel-based region-growing image segmentation

Arxiv

4+阅读 · 2018年3月17日

微信扫码咨询专知VIP会员