为多阶段强化学习任务制定合作政策 (Developing cooperative policies for multi-stage reinforcement learning tasks) - 专知论文

会员服务 ·

0

相互独立的 · 评论员 · 学成 · 强化学习 · 分层强化学习 ·

2022 年 5 月 11 日

Developing cooperative policies for multi-stage reinforcement learning tasks

翻译：为多阶段强化学习任务制定合作政策

Jordan Erskine,Chris Lehnert

from arxiv, This paper supersedes the rejected paper "Developing cooperative policies for multi-stage tasks". arXiv admin note: substantial text overlap with arXiv:2007.00203

Many hierarchical reinforcement learning algorithms utilise a series of independent skills as a basis to solve tasks at a higher level of reasoning. These algorithms don't consider the value of using skills that are cooperative instead of independent. This paper proposes the Cooperative Consecutive Policies (CCP) method of enabling consecutive agents to cooperatively solve long time horizon multi-stage tasks. This method is achieved by modifying the policy of each agent to maximise both the current and next agent's critic. Cooperatively maximising critics allows each agent to take actions that are beneficial for its task as well as subsequent tasks. Using this method in a multi-room maze domain and a peg in hole manipulation domain, the cooperative policies were able to outperform a set of naive policies, a single agent trained across the entire domain, as well as another sequential HRL algorithm.

翻译：许多等级强化学习算法利用一系列独立技能作为解决更高层次推理任务的基础。这些算法不考虑使用合作而不是独立技能的价值。本文件提出合作连续代理商合作解决长期跨时跨跨阶段任务的方法。实现这一方法的途径是修改每个代理商的政策,使当前和下一个代理商的批评意见最大化。合作最大化的批评者允许每个代理商采取有利于其任务和随后任务的行动。在多房间迷宫领域和孔操作领域使用这种方法,合作政策能够超越一套天真政策、一个在整个领域受训的单一代理商以及另一个连续的HRL算法。

0

相关内容

相互独立的

相互独立的

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

药油兼用红花品质形成的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于MMP-2的MRI分子成像评价糖尿病动脉粥样硬化斑块稳定性的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

水稻ERF转录因子家族IX亚组成员在抗病性中的功能及其作用机制

国家自然科学基金

1+阅读 · 2012年12月31日

钙信号系统调控香蕉耐盐生理和分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

共价嫁接铂卟啉配合物介孔分子筛的制备及氧传感性能

国家自然科学基金

0+阅读 · 2008年12月31日

Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings

Arxiv

0+阅读 · 2022年7月1日

Performative Reinforcement Learning

Arxiv

0+阅读 · 2022年6月30日

Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks

Arxiv

0+阅读 · 2022年6月30日

Consensus Learning for Cooperative Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2022年6月29日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

VIP会员

文章信息

相关主题

相互独立的

分层强化学习

相关VIP内容

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings

Arxiv

0+阅读 · 2022年7月1日

Performative Reinforcement Learning

Arxiv

0+阅读 · 2022年6月30日

Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks

Arxiv

0+阅读 · 2022年6月30日

Consensus Learning for Cooperative Multi-Agent Reinforcement Learning

Arxiv

1+阅读 · 2022年6月29日

Reinforced Negative Sampling over Knowledge Graph for Recommendation

Arxiv

17+阅读 · 2020年3月12日

相关基金

药油兼用红花品质形成的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于MMP-2的MRI分子成像评价糖尿病动脉粥样硬化斑块稳定性的实验研究

国家自然科学基金

0+阅读 · 2014年12月31日

水稻ERF转录因子家族IX亚组成员在抗病性中的功能及其作用机制

国家自然科学基金

1+阅读 · 2012年12月31日

钙信号系统调控香蕉耐盐生理和分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

共价嫁接铂卟啉配合物介孔分子筛的制备及氧传感性能

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员