强化学习 (Reachability Constrained Reinforcement Learning) - 专知论文

会员服务 ·

0

可行 · 学成 · 情景 · Performer · 控制器 ·

2022 年 5 月 16 日

Reachability Constrained Reinforcement Learning

翻译：强化学习

Dongjie Yu,Haitong Ma,Shengbo Eben Li,Jianyu Chen

from arxiv, Accepted by ICML 2022

Constrained Reinforcement Learning (CRL) has gained significant interest recently, since the satisfaction of safety constraints is critical for real world problems. However, existing CRL methods constraining discounted cumulative costs generally lack rigorous definition and guarantee of safety. On the other hand, in the safe control research, safety is defined as persistently satisfying certain state constraints. Such persistent safety is possible only on a subset of the state space, called feasible set, where an optimal largest feasible set exists for a given environment. Recent studies incorporating safe control with CRL using energy-based methods such as control barrier function (CBF), safety index (SI) leverage prior conservative estimation of feasible sets, which harms performance of the learned policy. To deal with this problem, this paper proposes a reachability CRL (RCRL) method by using reachability analysis to characterize the largest feasible sets. We characterize the feasible set by the established self-consistency condition, then a safety value function can be learned and used as constraints in CRL. We also use the multi-time scale stochastic approximation theory to prove that the proposed algorithm converges to a local optimum, where the largest feasible set can be guaranteed. Empirical results on different benchmarks such as safe-control-gym and Safety-Gym validate the learned feasible set, the performance in optimal criteria, and constraint satisfaction of RCRL, compared to state-of-the-art CRL baselines.

翻译：由于对安全限制的满意程度对于现实世界问题至关重要,因此最近对加强强化学习(CRL)的兴趣已大为增加,因为安全限制的满足程度对真实世界问题至关重要;然而,限制贴现累积成本的现有CRL方法通常缺乏严格的定义和安全保障;另一方面,在安全控制研究中,安全的定义是持续满足某些国家制约因素;这种持久性安全只有在国家空间的一个子组上才有可能,即所谓的“可行成套”,其中为特定环境提供了最佳的可行成套套套件;最近利用控制屏障功能(CBF)、安全指数(SI)等基于能源的方法将安全控制与CRL(CRL)结合在一起的研究利用了对可行数据集的保守估计,这损害了所学政策的绩效;为解决这一问题,本文件建议采用CRRL(RCL)(RCL)(RCL)(RCL)(C)(CL)(CRR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(C)(CR)(CR)(CR)(CL)(CR)(CR)(CR)(C)(C)(C)(CR)(C)(C)(C)(C)(CR)(CR)(CR)(CR)(CR)(CR)(CR)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C) (C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C)(C) (C)(C)(C)(C)(C)(C)(C) (C)))(C) (C)(C)(C)(C)(C)(C)(C)(C)(C)(C

9

相关内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

面向X-CT应用的(Ce, Lu)3(Cr, Al)5O12闪烁陶瓷中过渡金属离子的光谱展宽效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

(CexA1-x)2Ti2O7 (A=Y, Gd, Lu; x=0-1)的制备及离子束辐照效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

Mg-Al-Ca-Sr镁合金热变形行为的位错机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

EAST 托卡马克上ELM细丝结构的动力学过程研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的若干参数及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

铁电阴极强电子发射特性和机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

图的多项式研究及应用

国家自然科学基金

1+阅读 · 2008年12月31日

Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

Arxiv

0+阅读 · 2022年7月5日

Safe Reinforcement Learning via Confidence-Based Filters

Arxiv

0+阅读 · 2022年7月4日

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Arxiv

0+阅读 · 2022年7月3日

Computational-Statistical Gaps in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月3日

Component-wise Analysis of Automatically Designed Multiobjective Algorithms on Constrained Problems

Component-wise Analysis of Automatically Designed Multiobjective Algorithms on Constrained Problems

Arxiv

0+阅读 · 2022年7月1日

Modular Lifelong Reinforcement Learning via Neural Composition

Arxiv

0+阅读 · 2022年7月1日

Risk Perspective Exploration in Distributional Reinforcement Learning

Arxiv

0+阅读 · 2022年7月1日

Performative Reinforcement Learning

Arxiv

0+阅读 · 2022年6月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

VIP会员

文章信息

相关主题

相关VIP内容

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

【MIla】一种意识启发规划的基于模型强化学习，A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning

专知会员服务

23+阅读 · 2022年3月19日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《弹药快速效能建模：推进互操作性与技术优势》2025最新26页报告

《国防仿真模型的优化与分析》

《军事域人工智能风险、机遇与治理战略指导报告》2025最新76页报告

《杀伤网与精确规模：智能饱和战争时代的战略要务-印度视角》2025最新报告

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

Arxiv

0+阅读 · 2022年7月5日

Safe Reinforcement Learning via Confidence-Based Filters

Arxiv

0+阅读 · 2022年7月4日

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Arxiv

0+阅读 · 2022年7月3日

Computational-Statistical Gaps in Reinforcement Learning

Arxiv

0+阅读 · 2022年7月3日

Component-wise Analysis of Automatically Designed Multiobjective Algorithms on Constrained Problems

Component-wise Analysis of Automatically Designed Multiobjective Algorithms on Constrained Problems

Arxiv

0+阅读 · 2022年7月1日

Modular Lifelong Reinforcement Learning via Neural Composition

Arxiv

0+阅读 · 2022年7月1日

Risk Perspective Exploration in Distributional Reinforcement Learning

Arxiv

0+阅读 · 2022年7月1日

Performative Reinforcement Learning

Arxiv

0+阅读 · 2022年6月30日

Recent Advances in Reinforcement Learning in Finance

Arxiv

11+阅读 · 2021年12月8日

A Multi-Objective Deep Reinforcement Learning Framework

A Multi-Objective Deep Reinforcement Learning Framework

Arxiv

16+阅读 · 2018年6月27日

相关基金

面向X-CT应用的(Ce, Lu)3(Cr, Al)5O12闪烁陶瓷中过渡金属离子的光谱展宽效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

(CexA1-x)2Ti2O7 (A=Y, Gd, Lu; x=0-1)的制备及离子束辐照效应研究

国家自然科学基金

0+阅读 · 2014年12月31日

靶向调节HDAC6增加t-PA静脉溶栓治疗的有效性及安全性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

Mg-Al-Ca-Sr镁合金热变形行为的位错机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

EAST 托卡马克上ELM细丝结构的动力学过程研究

国家自然科学基金

0+阅读 · 2012年12月31日

图的若干参数及算法研究

国家自然科学基金

0+阅读 · 2011年12月31日

铁电阴极强电子发射特性和机制研究

国家自然科学基金

0+阅读 · 2008年12月31日

图的多项式研究及应用

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员