外部政策间隔估计 (Deeply-Debiased Off-Policy Interval Estimation) - 专知论文

会员服务 ·

0

估计/估计量 · 置信度 · HTTPS · 稳健性 · 学成 ·

2021 年 6 月 7 日

Deeply-Debiased Off-Policy Interval Estimation

翻译：外部政策间隔估计

Chengchun Shi,Runzhe Wan,Victor Chernozhukov,Rui Song

Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE.

翻译：离政策评价以不同行为政策产生的历史数据集来学习目标政策的价值;除了点数估计外,许多应用将大大受益于一个信任间隔(CI),该间隔可以量化点数估计的不确定性;在本文件中,我们提议了一个新的深刻偏见程序,在目标政策的价值上构建一个高效、稳健和灵活的目标政策指标。我们的方法有理论结果和数字实验的正当理由。在https://github.com/RunzheStat/D2OPE上可以找到拟议程序Python实施情况。

0

相关内容

估计/估计量

估计/估计量

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

98+阅读 · 2019年12月4日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Optimal Covariate Balancing Conditions in Propensity Score Estimation

Arxiv

0+阅读 · 2021年8月3日

Debiased Off-Policy Evaluation for Recommendation Systems

Arxiv

0+阅读 · 2021年8月2日

Fast and numerically stable particle-based online additive smoothing: the AdaSmooth algorithm

Arxiv

0+阅读 · 2021年8月1日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【ICML2021】异质风险最小化，Heterogeneous Risk Minimization

专知会员服务

16+阅读 · 2021年5月21日

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

【Google DeepMind & 斯坦福 AAAI2020】Options of Interest Temporal Abstraction with Interest Function

专知会员服务

5+阅读 · 2020年1月5日

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

【干货】大数据入门指南：Hadoop、Hive、Spark、 Storm等

专知会员服务

98+阅读 · 2019年12月4日

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

【CoRL2019最佳论文】模仿学习，A Divergence Minimization Perspective on Imitation Learning Methods

专知会员服务

24+阅读 · 2019年11月11日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机系统 - 反无人机系统：测试方法》364页

《无人机蜂群攻击防御的预测建模：面向美军战备的人工智能轨迹预测与最优拦截策略设计》最新报告

美军低成本无人作战攻击系统（LUCAS）：扩大无人机战争规模

《将空中力量带向海洋：美国海军航空发展的四条竞争路径及其教训》报告

相关资讯

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Optimal Covariate Balancing Conditions in Propensity Score Estimation

Arxiv

0+阅读 · 2021年8月3日

Debiased Off-Policy Evaluation for Recommendation Systems

Arxiv

0+阅读 · 2021年8月2日

Fast and numerically stable particle-based online additive smoothing: the AdaSmooth algorithm

Arxiv

0+阅读 · 2021年8月1日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation

Arxiv

8+阅读 · 2018年12月18日

微信扫码咨询专知VIP会员