在线决策转换器 (Online Decision Transformer) - 专知论文

会员服务 ·

0

ODT · 变换 · 在线 · MoDELS · INTERACT ·

2022 年 7 月 13 日

Online Decision Transformer

翻译：在线决策转换器

Qinqing Zheng,Amy Zhang,Aditya Grover

from arxiv, Accepted to ICML 2022 (Long Oral Presentation)

Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.

翻译：最近的工作表明,离线强化学习(RL)可以作为一个序列建模问题来制定(Chen等人,2021年;Janner等人,2021年),并通过类似于大规模语言建模的办法加以解决,然而,任何实际的RL即时化还涉及一个在线部分,通过与环境的具体任务互动,对被动离线数据集预先培训的政策进行微调。我们建议在线决定变换器(ODT)是一种基于序列建模的RL算法,它将离线前训练与在线微调混合在一起,在一个统一的框架内,我们的框架使用序列级的诱导调节器与自动递增式建模目标相结合,用于抽样高效的勘探和微调。我们很生动地表明,ODT在D在D4RL基准绝对性表现方面与最新水平相比具有竞争力,但在微调程序期间显示出更大的收益。

7

相关内容

ODT

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

68+阅读 · 2022年3月29日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于Heparosan多糖长循环多功能聚合物胶束的构建及作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Al-In-X(X=Er,Zn)体系相图、相结构及体系富铝合金电化学行为研究

国家自然科学基金

0+阅读 · 2013年12月31日

RERT-lncRNA调控EGLN2在肝细胞肝癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

阻断(前)肾素受体改善MSG大鼠胰岛素抵抗及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

核函数优化选择的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

Latency Adjustable Transformer Encoder for Language Understanding

Arxiv

0+阅读 · 2022年9月7日

Prediction Based Decision Making for Autonomous Highway Driving

Arxiv

0+阅读 · 2022年9月5日

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Arxiv

0+阅读 · 2022年9月1日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

68+阅读 · 2022年3月29日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

325+阅读 · 2020年11月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Latency Adjustable Transformer Encoder for Language Understanding

Arxiv

0+阅读 · 2022年9月7日

Prediction Based Decision Making for Autonomous Highway Driving

Arxiv

0+阅读 · 2022年9月5日

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Arxiv

0+阅读 · 2022年9月1日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

Deep Reinforcement Learning for List-wise Recommendations

Arxiv

13+阅读 · 2018年1月5日

相关基金

基于Heparosan多糖长循环多功能聚合物胶束的构建及作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Al-In-X(X=Er,Zn)体系相图、相结构及体系富铝合金电化学行为研究

国家自然科学基金

0+阅读 · 2013年12月31日

RERT-lncRNA调控EGLN2在肝细胞肝癌发生中的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

阻断(前)肾素受体改善MSG大鼠胰岛素抵抗及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

核函数优化选择的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员