利用原型学习进行基于个案的非政策政策评价 (Case-based off-policy policy evaluation using prototype learning) - 专知论文

会员服务 ·

0

估计/估计量 · prototype · 策略评估 · 重要性采样 · Performer ·

2021 年 11 月 22 日

Case-based off-policy policy evaluation using prototype learning

翻译：利用原型学习进行基于个案的非政策政策评价

Anton Matsson,Fredrik D. Johansson

Importance sampling (IS) is often used to perform off-policy policy evaluation but is prone to several issues, especially when the behavior policy is unknown and must be estimated from data. Significant differences between the target and behavior policies can result in uncertain value estimates due to, for example, high variance and non-evaluated actions. If the behavior policy is estimated using black-box models, it can be hard to diagnose potential problems and to determine for which inputs the policies differ in their suggested actions and resulting values. To address this, we propose estimating the behavior policy for IS using prototype learning. We apply this approach in the evaluation of policies for sepsis treatment, demonstrating how the prototypes give a condensed summary of differences between the target and behavior policies while retaining an accuracy comparable to baseline estimators. We also describe estimated values in terms of the prototypes to better understand which parts of the target policies have the most impact on the estimates. Using a simulator, we study the bias resulting from restricting models to use prototypes.

翻译：重要性抽样(IS)通常用于进行非政策性政策评价,但容易出现一些问题,特别是行为政策未知,必须从数据中估算。目标和行为政策之间的重大差异可能导致价值估计不确定,例如,由于差异大和未评估的行动。如果行为政策使用黑盒模型估算,那么很难诊断潜在问题并确定政策在建议的行动和结果价值中有哪些投入不同。为了解决这个问题,我们提议利用原型学习来估算IS的行为政策。我们在评估消毒治疗政策时采用这一方法,说明原型如何对目标和行为政策之间的差异进行压缩汇总,同时保留与基线估计值相近的准确性。我们还用原型来描述估计值,以便更好地了解目标政策中哪些部分对估计影响最大。我们用模拟器研究限制模型使用原型所产生的偏差。

0

相关内容

估计/估计量

估计/估计量

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

专知会员服务

9+阅读 · 2020年6月10日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

49+阅读 · 2020年1月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

AI研习社

17+阅读 · 2017年10月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Extreme events evaluation using CRPS distributions

Extreme events evaluation using CRPS distributions

Arxiv

0+阅读 · 2022年1月26日

On the Relationship between Explanation and Recommendation: Learning to Rank Explanations for Improved Performance

Arxiv

0+阅读 · 2022年1月25日

Deconfounding to Explanation Evaluation in Graph Neural Networks

Arxiv

0+阅读 · 2022年1月21日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Learning Optimal Representations with the Decodable Information Bottleneck

Arxiv

6+阅读 · 2020年9月27日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Efficient Eligibility Traces for Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年10月23日

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Arxiv

11+阅读 · 2018年7月12日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Arxiv

3+阅读 · 2018年1月17日

VIP会员

文章信息

相关主题

估计/估计量

重要性采样

相关VIP内容

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

【SIGIR2020】多检索系统的贝叶斯推理风险评估，Bayesian Inferential Risk Evaluation On Multiple IR Systems

专知会员服务

9+阅读 · 2020年6月10日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

252+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

【贝叶斯深度学习：一种基于模型的可解释方法】Bayesian deep learning: A model-based interpretable approach

专知会员服务

49+阅读 · 2020年1月1日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

全球AI工具市场发展现状与趋势分析2025

自动驾驶地图：全流程综述与前沿进展

协同智能体：多智能体人工智能系统如何变革军事训练及其他领域

【NeurIPS2025】TITAN：一种面向轨迹感知的大规模 VQE 自适应参数冻结技术

相关资讯

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】直接未来预测：增强学习监督学习

【推荐】直接未来预测：增强学习监督学习

机器学习研究会

6+阅读 · 2017年11月24日

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

基于深度学习的医疗影像论文汇总（Deep Learning Papers on Medical Image Analysis）

AI研习社

17+阅读 · 2017年10月21日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Extreme events evaluation using CRPS distributions

Extreme events evaluation using CRPS distributions

Arxiv

0+阅读 · 2022年1月26日

On the Relationship between Explanation and Recommendation: Learning to Rank Explanations for Improved Performance

Arxiv

0+阅读 · 2022年1月25日

Deconfounding to Explanation Evaluation in Graph Neural Networks

Arxiv

0+阅读 · 2022年1月21日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Hyperparameter Selection for Imitation Learning

Arxiv

7+阅读 · 2021年5月25日

Learning Optimal Representations with the Decodable Information Bottleneck

Arxiv

6+阅读 · 2020年9月27日

Causal Discovery with Reinforcement Learning

Arxiv

4+阅读 · 2020年3月19日

Efficient Eligibility Traces for Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年10月23日

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

The Bottleneck Simulator: A Model-based Deep Reinforcement Learning Approach

Arxiv

11+阅读 · 2018年7月12日

The Case for Automatic Database Administration using Deep Reinforcement Learning

Arxiv

3+阅读 · 2018年1月17日

微信扫码咨询专知VIP会员