离线强化学习中V-Learning的贝尔曼校准 (Bellman Calibration for V-Learning in Offline Reinforcement Learning) - 专知论文

会员服务 ·

0

Learning · 离策略 · 拟合 · 离线强化学习 · 后处理 ·

Bellman Calibration for V-Learning in Offline Reinforcement Learning

翻译：离线强化学习中V-Learning的贝尔曼校准

Lars van der Laan,Nathan Kallus

We introduce Iterated Bellman Calibration, a simple, model-agnostic, post-hoc procedure for calibrating off-policy value predictions in infinite-horizon Markov decision processes. Bellman calibration requires that states with similar predicted long-term returns exhibit one-step returns consistent with the Bellman equation under the target policy. We adapt classical histogram and isotonic calibration to the dynamic, counterfactual setting by repeatedly regressing fitted Bellman targets onto a model's predictions, using a doubly robust pseudo-outcome to handle off-policy data. This yields a one-dimensional fitted value iteration scheme that can be applied to any value estimator. Our analysis provides finite-sample guarantees for both calibration and prediction under weak assumptions, and critically, without requiring Bellman completeness or realizability.

翻译：我们提出了一种简单、模型无关、后处理的迭代贝尔曼校准方法，用于校准无限时域马尔可夫决策过程中的离策略价值预测。贝尔曼校准要求具有相似预测长期回报的状态，在目标策略下展现出与贝尔曼方程一致的单步回报。通过使用双重稳健伪结果处理离策略数据，我们反复将拟合的贝尔曼目标回归到模型的预测上，从而将经典直方图校准和等渗校准方法适配到动态、反事实的场景中。这产生了一个一维的拟合价值迭代方案，可应用于任何价值估计器。我们的分析在弱假设下为校准和预测提供了有限样本保证，并且关键地，无需贝尔曼完备性或可实现性假设。

0

相关内容

Learning

【ICCV2021】参数化对比学习

专知会员服务

33+阅读 · 2021年7月27日

【ICML2021】具有性能保证的弱监督下的对抗性多类学习

专知会员服务

17+阅读 · 2021年7月13日

【ICML2021】基于低秩重参数化的大规模私有学习

专知会员服务

12+阅读 · 2021年6月20日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知

37+阅读 · 2020年6月11日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

半监督多任务学习：Semisupervised Multitask Learning

半监督多任务学习：Semisupervised Multitask Learning

我爱读PAMI

18+阅读 · 2018年4月29日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

基于随机有限集理论的复杂背景视频多目标跟踪研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

基于分层稀疏表示的微动目标ISAR三维层析成像技术

国家自然科学基金

1+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

Profile Bayesian Optimization for Expensive Computer Experiments

Arxiv

0+阅读 · 12月29日

Active Constraint Learning in High Dimensions from Demonstrations

Arxiv

0+阅读 · 12月28日

Confidence Calibration in Vision-Language-Action Models

Arxiv

0+阅读 · 12月22日

A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models

Arxiv

0+阅读 · 12月21日

A Dependent Feature Allocation Model Based on Random Fields

Arxiv

0+阅读 · 12月19日

VIP会员

文章信息

相关主题

离线强化学习

相关VIP内容

【ICCV2021】参数化对比学习

专知会员服务

33+阅读 · 2021年7月27日

【ICML2021】具有性能保证的弱监督下的对抗性多类学习

专知会员服务

17+阅读 · 2021年7月13日

【ICML2021】基于低秩重参数化的大规模私有学习

专知会员服务

12+阅读 · 2021年6月20日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《北约联合仿真与集成、验证与鉴定服务标准》2025最新40页

《面向协同任务的无人地面车辆与无人机（UGV-UAV）集成研究综述》2025最新综述论文

《理解大语言模型在军事战术任务规划中的局限性》

《国防与安全会议论文集》最新80页

相关资讯

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

【ICML2020】多视角对比图表示学习，Contrastive Multi-View GRL

专知

37+阅读 · 2020年6月11日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

半监督多任务学习：Semisupervised Multitask Learning

半监督多任务学习：Semisupervised Multitask Learning

我爱读PAMI

18+阅读 · 2018年4月29日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

相关论文

Profile Bayesian Optimization for Expensive Computer Experiments

Arxiv

0+阅读 · 12月29日

Active Constraint Learning in High Dimensions from Demonstrations

Arxiv

0+阅读 · 12月28日

Confidence Calibration in Vision-Language-Action Models

Arxiv

0+阅读 · 12月22日

A Theoretical Lens for RL-Tuned Language Models via Energy-Based Models

Arxiv

0+阅读 · 12月21日

A Dependent Feature Allocation Model Based on Random Fields

Arxiv

0+阅读 · 12月19日

相关基金

基于随机有限集理论的复杂背景视频多目标跟踪研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

切换系统的容错保成本和容错H无穷控制

国家自然科学基金

0+阅读 · 2015年12月31日

基于分层稀疏表示的微动目标ISAR三维层析成像技术

国家自然科学基金

1+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员