瓦森施泰因信仰者:通过可靠的冷冻空间模型为部分可观测环境学习信仰最新情况</s> (The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models) - 专知论文

会员服务 ·

0

回合 · Learning · MoDELS · 潜在 · Agent ·

2023 年 3 月 6 日

The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

翻译：瓦森施泰因信仰者:通过可靠的冷冻空间模型为部分可观测环境学习信仰最新情况

Raphael Avalos,Florent Delgrange,Ann Nowé,Guillermo A. Pérez,Diederik M. Roijers

Partially Observable Markov Decision Processes (POMDPs) are useful tools to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Keeping a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is also intractable. Current state-of-the-art algorithms use Recurrent Neural Networks (RNNs) to compress the observation-action history aiming to learn a sufficient statistic, but they lack guarantees of success and can lead to suboptimal policies. To overcome this, we propose the Wasserstein-Belief-Updater (WBU), an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function.

翻译：部分可观察的 Markov 决策程序( POMDPs) 是模拟环境的有用工具, 使整个状态无法被代理方看到。因此, 代理方需要考虑到以往的观察和行动, 以理性为根据。但是, 简单的回忆整个历史一般由于历史空间的指数增长而难以解决。保持一种概率分布, 将真实状态的信念模型用作对历史的充分统计, 但其计算需要对环境模型的利用, 并且也是难以操作的。目前最先进的算法使用经常性神经网络( RNN) 来压缩观察行动历史, 以学习足够的统计数据, 但是它们缺乏成功保证, 并可能导致不完善的政策。为了克服这一点, 我们提议了瓦塞斯坦- 贝利夫- 更新( WBUB), 一种RL 算法, 学习POMDP 的潜在模型, 以及更新信仰的近似值。我们的方法是在理论上保证我们的近似质量, 以确保我们输出的信念允许学习最优值功能。</s>

0

相关内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

随机偏微分方程

国家自然科学基金

5+阅读 · 2017年12月31日

两类分数阶发展方程解的适定性及吸引子

国家自然科学基金

0+阅读 · 2015年12月31日

相场方程的弱超内罚间断Galerkin方法及其自适应算法

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于金属有机骨架的复合体系中电子态行为的理论模拟研究

国家自然科学基金

0+阅读 · 2014年12月31日

无线传感器网络中基于节点行为和身份的概率认证

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

Predictive World Models from Real-World Partial Observations

Arxiv

0+阅读 · 2023年4月26日

Numerical Analysis for Real-time Nonlinear Model Predictive Control of Ethanol Steam Reformers

Arxiv

0+阅读 · 2023年4月26日

Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning

Arxiv

0+阅读 · 2023年4月25日

Selective Knowledge Sharing for Privacy-Preserving Federated Distillation without A Good Teacher

Arxiv

0+阅读 · 2023年4月25日

Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

Arxiv

0+阅读 · 2023年4月25日

Stochastic Soiling Loss Models for Heliostats in Concentrating Solar Power Plants

Arxiv

0+阅读 · 2023年4月24日

Policy Learning under Biased Sample Selection

Arxiv

0+阅读 · 2023年4月23日

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Arxiv

0+阅读 · 2023年4月21日

Active Bayesian Causal Inference

Arxiv

14+阅读 · 2022年10月15日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

VIP会员

文章信息

相关主题

相关VIP内容

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

AI CITY发展研究报告：“人工智能+”时代的智慧城市发展范式创新（2025年）

风格迁移：十年综述

【ICCV2025】CL-Splats：结合局部优化的高斯泼洒持续学习方法

【HKUST博士论文】迈向可扩展且具泛化能力的时空预测

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Predictive World Models from Real-World Partial Observations

Arxiv

0+阅读 · 2023年4月26日

Numerical Analysis for Real-time Nonlinear Model Predictive Control of Ethanol Steam Reformers

Arxiv

0+阅读 · 2023年4月26日

Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning

Arxiv

0+阅读 · 2023年4月25日

Selective Knowledge Sharing for Privacy-Preserving Federated Distillation without A Good Teacher

Arxiv

0+阅读 · 2023年4月25日

Partially Observable Mean Field Multi-Agent Reinforcement Learning Based on Graph-Attention

Arxiv

0+阅读 · 2023年4月25日

Stochastic Soiling Loss Models for Heliostats in Concentrating Solar Power Plants

Arxiv

0+阅读 · 2023年4月24日

Policy Learning under Biased Sample Selection

Arxiv

0+阅读 · 2023年4月23日

Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Arxiv

0+阅读 · 2023年4月21日

Active Bayesian Causal Inference

Arxiv

14+阅读 · 2022年10月15日

Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

Arxiv

23+阅读 · 2020年3月7日

相关基金

随机偏微分方程

国家自然科学基金

5+阅读 · 2017年12月31日

两类分数阶发展方程解的适定性及吸引子

国家自然科学基金

0+阅读 · 2015年12月31日

相场方程的弱超内罚间断Galerkin方法及其自适应算法

国家自然科学基金

1+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于金属有机骨架的复合体系中电子态行为的理论模拟研究

国家自然科学基金

0+阅读 · 2014年12月31日

无线传感器网络中基于节点行为和身份的概率认证

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

重调和方程基于Poisson算子的高效有限元方法

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员