POMDPs非政策评价的旁观方法 (A Spectral Approach to Off-Policy Evaluation for POMDPs) - 专知论文

会员服务 ·

0

观测变量 · 重要性采样 · 求逆 · 预测准确率 · 隐状态 ·

2021 年 9 月 22 日

A Spectral Approach to Off-Policy Evaluation for POMDPs

翻译：POMDPs非政策评价的旁观方法

Yash Nair,Nan Jiang

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes, where the evaluation policy depends only on observable variables but the behavior policy depends on latent states (Tennenholtz et al. (2020a)). Prior work on this problem uses a causal identification strategy based on one-step observable proxies of the hidden state, which relies on the invertibility of certain one-step moment matrices. In this work, we relax this requirement by using spectral methods and extending one-step proxies both into the past and future. We empirically compare our OPE methods to existing ones and demonstrate their improved prediction accuracy and greater generality. Lastly, we derive a separate Importance Sampling (IS) algorithm which relies on rank, distinctness, and positivity conditions, and not on the strict sufficiency conditions of observable trajectories with respect to the reward and hidden-state structure required by Tennenholtz et al. (2020a).

翻译：在部分可观察的Markov决策程序中,我们考虑政策外评价,因为评价政策仅取决于可观察的变量,而行为政策则取决于潜在状态(Tennnholtz等人(2020年a) ) 。先前关于该问题的工作采用了基于隐藏状态的一步可观察的近似值的因果关系识别战略,这一战略依赖于某些一分一秒的矩阵的可视性。在这项工作中,我们通过使用光谱方法,将一步的代理人延伸到过去和将来,放松了这一要求。我们实证地比较了我们的OPE方法与现有的方法,并表明其预测的准确性和更加笼统性。最后,我们得出了一种独立的重要性抽样算法,该算法依赖于等级、区别性和假设性条件,而不是Tenenholtz等人(2020年a)要求的奖赏和隐藏状态结构方面的可观察轨迹的严格充分条件。

0

相关内容

观测变量

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

专知会员服务

55+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

4+阅读 · 2019年4月1日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Spectral learning of multivariate extremes

Arxiv

0+阅读 · 2021年11月15日

Observation Contribution Theory for Pose Estimation Accuracy

Arxiv

0+阅读 · 2021年11月15日

The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation

Arxiv

0+阅读 · 2021年11月14日

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

Arxiv

0+阅读 · 2021年11月12日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank

Arxiv

3+阅读 · 2021年2月11日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Arxiv

14+阅读 · 2019年8月12日

A survey on policy search algorithms for learning robot controllers in a handful of trials

Arxiv

3+阅读 · 2018年7月6日

SpectralLeader: Online Spectral Learning for Single Topic Models

Arxiv

4+阅读 · 2018年2月16日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

重要性采样

预测准确率

相关VIP内容

【伯克利-Pieter Abbeel】深度强化学习基础，附slides与视频

专知会员服务

29+阅读 · 2021年8月26日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

【深度伪造综述论文】The Creation and Detection of Deepfakes: A Survey

专知会员服务

55+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

【伯克利，基于模型的强化学习：理论与实践】《Model-Based Reinforcement Learning:Theory and Practice》，Michael Janner

专知会员服务

35+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】利用强化学习与生成模型推进可靠且可泛化的决策

美海军研发“增强侦察与态势评估系统（ARES）”应用程序以优化作战规划（附研究论文）

【NeurIPS2025】DNA-DetectLLM：基于 DNA 启发的“突变-修复”范式揭示 AI 生成文本

面向深度研究系统的强化学习基础：综述

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

4+阅读 · 2019年4月1日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

CCF C类 | IJCNN 2019 Special Section : 信息论与深度学习

Call4Papers

5+阅读 · 2018年12月7日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Spectral learning of multivariate extremes

Arxiv

0+阅读 · 2021年11月15日

Observation Contribution Theory for Pose Estimation Accuracy

Arxiv

0+阅读 · 2021年11月15日

The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation

Arxiv

0+阅读 · 2021年11月14日

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

Arxiv

0+阅读 · 2021年11月12日

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

Arxiv

9+阅读 · 2021年2月23日

Robust Generalization and Safe Query-Specialization in Counterfactual Learning to Rank

Arxiv

3+阅读 · 2021年2月11日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Arxiv

14+阅读 · 2019年8月12日

A survey on policy search algorithms for learning robot controllers in a handful of trials

Arxiv

3+阅读 · 2018年7月6日

SpectralLeader: Online Spectral Learning for Single Topic Models

Arxiv

4+阅读 · 2018年2月16日

Parameter Space Noise for Exploration

Arxiv

3+阅读 · 2018年1月31日

微信扫码咨询专知VIP会员