有短期记忆的可实现强化学习 (Provable Reinforcement Learning with a Short-Term Memory) - 专知论文

会员服务 ·

0

样本复杂度 · 学成 · 部分可观测马尔可夫决策过程 · 矩匹配 · 强化学习 ·

2022 年 2 月 8 日

Provable Reinforcement Learning with a Short-Term Memory

翻译：有短期记忆的可实现强化学习

Yonathan Efroni,Chi Jin,Akshay Krishnamurthy,Sobhan Miryoosefi

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length $m$ rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.

翻译：现实世界的顺序决策问题通常涉及部分可观察性,这要求代理人保持历史记忆以推断潜在状态、计划和做出正确的决定。应对部分可部分可观察性一般是极具挑战性的,因为学习部分可观察的Markov 决策程序(POMDPs)中知道一些最坏的统计和计算障碍。受问题结构在若干物理应用中的影响,以及一种通常使用的技术“框架堆叠”的驱动,本文件提议研究一个新的POMDP的子类,其潜在状态可以被短长度的近期历史解码,其潜在状态可以被短时间的美元平面解码。我们为在表格和丰富观察环境中学习这一类问题的抽样复杂程度设置了一套上下限。我们用一种新型的“移动匹配”方法开发了新的算法,其样本复杂程度与短长度的美元比重成倍,而不是问题地平面,并且从短期的记忆环境中独立地展示了我们学习的足够记忆环境。

0

相关内容

样本复杂度

样本复杂度

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

基于工业大数据挖掘的复杂产品总完工时间动态预测

国家自然科学基金

4+阅读 · 2015年12月31日

变工况机械动态信号瞬时耦合的理解、识别与故障预示

国家自然科学基金

2+阅读 · 2015年12月31日

基于原位测定的NiTi记忆合金相变塑性的非均匀多场复杂行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂动态场景中鲁棒的视觉跟踪算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用参量结构实现复杂信号环境下盲信号分离方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

异质信道盲会合算法

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

复杂网络系统的有限时间同步控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于EMD的复杂声学环境下语音检测与增强

国家自然科学基金

1+阅读 · 2008年12月31日

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2022年4月20日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

Reinforcement Learning Guided by Provable Normative Compliance

Arxiv

0+阅读 · 2022年4月19日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

Arxiv

1+阅读 · 2022年4月15日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

VIP会员

文章信息

相关主题

样本复杂度

部分可观测马尔可夫决策过程

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

俄乌战争启示：坦克战与不断演变的战斗形态

《大规模作战行动中与无人机集成的C5ISR系统》

《主观概率约束下寻找可行系统及其军事应用》69页

《美政府问责局：多种挑战影响地面战车任务出勤率》2025最新130页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

A sojourn-based approach to semi-Markov Reinforcement Learning

Arxiv

0+阅读 · 2022年4月20日

Mingling Foresight with Imagination: Model-Based Cooperative Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2022年4月20日

When Is Partially Observable Reinforcement Learning Not Scary?

Arxiv

0+阅读 · 2022年4月19日

Reinforcement Learning Guided by Provable Normative Compliance

Arxiv

0+阅读 · 2022年4月19日

Efficient Bayesian Policy Reuse with a Scalable Observation Model in Deep Reinforcement Learning

Arxiv

0+阅读 · 2022年4月19日

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

CHAI: A CHatbot AI for Task-Oriented Dialogue with Offline Reinforcement Learning

Arxiv

0+阅读 · 2022年4月18日

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

Arxiv

1+阅读 · 2022年4月15日

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy

Arxiv

0+阅读 · 2022年4月7日

Transfer Learning in Deep Reinforcement Learning: A Survey

Transfer Learning in Deep Reinforcement Learning: A Survey

Arxiv

23+阅读 · 2020年9月16日

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey

Arxiv

20+阅读 · 2020年3月10日

相关基金

基于工业大数据挖掘的复杂产品总完工时间动态预测

国家自然科学基金

4+阅读 · 2015年12月31日

变工况机械动态信号瞬时耦合的理解、识别与故障预示

国家自然科学基金

2+阅读 · 2015年12月31日

基于原位测定的NiTi记忆合金相变塑性的非均匀多场复杂行为研究

国家自然科学基金

0+阅读 · 2012年12月31日

复杂动态场景中鲁棒的视觉跟踪算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

利用参量结构实现复杂信号环境下盲信号分离方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

异质信道盲会合算法

国家自然科学基金

0+阅读 · 2012年12月31日

滇西老厂富银红土型锰矿次生富集机制及40Ar/39Ar年龄

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

复杂网络系统的有限时间同步控制研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于EMD的复杂声学环境下语音检测与增强

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员