通过双功能迭接对非政策性政策升级进行最佳估计 (Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration) - 专知论文

会员服务 ·

0

估计/估计量 · FPG · PG · 估计误差 · 价值函数 ·

2022 年 4 月 15 日

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

翻译：通过双功能迭接对非政策性政策升级进行最佳估计

Chengzhuo Ni,Ruiqi Zhang,Xiang Ji,Xuezhou Zhang,Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy. Conventional methods for off-policy PG estimation often suffer from either significant bias or exponentially large variance. In this paper, we propose the double Fitted PG estimation (FPG) algorithm. FPG can work with an arbitrary policy parameterization, assuming access to a Bellman-complete value function class. In the case of linear value function approximation, we provide a tight finite-sample upper bound on policy gradient estimation error, that is governed by the amount of distribution mismatch measured in feature space. We also establish the asymptotic normality of FPG estimation error with a precise covariance characterization, which is further shown to be statistically optimal with a matching Cramer-Rao lower bound. Empirically, we evaluate the performance of FPG on both policy gradient estimation and policy optimization, using either softmax tabular or ReLU policy networks. Under various metrics, our results show that FPG significantly outperforms existing off-policy PG estimation methods based on importance sampling and variance reduction techniques.

翻译：政策梯度(PG) 估计是一个挑战,当我们不允许与目标政策进行抽样,而只能获取由某些未知行为政策产生的数据集时,我们便会遇到挑战。常规的非政策PG估计方法往往有显著的偏差或极大的差异。在本文中,我们建议采用双相配的PG估计算法。 FPG可以使用任意的政策参数化,假设可以使用贝尔曼-完整值函数级。在线性值函数近似值的情况下,我们提供严格限值表或ReLU政策估计误差的上限,该误差受特征空间测量的分布不匹配量制约。我们还建立了FPG估计错误的无规律性常态性,并有一个精确的共变性特征。在统计上,这进一步显示与一个匹配的Cramer-Rao较低约束值值值值值值值值值值相匹配是最佳的。我们使用软模度表表或ReLU政策网来评估FPG在政策梯度估计和政策优化方面的绩效。在各种指标下,我们的结果显示FPG明显优于基于重要性和降低差异的技术的现有非政策PG估计方法。

0

相关内容

估计/估计量

估计/估计量

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Waardenburg综合征的拷贝数变异检测及其致病机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

烟草致癌物NNK诱发长链非编码RNA基因突变在肺癌发生中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

抑制Notch信号通路对成年脊髓神经干/祖细胞的调控及其在脊髓损伤修复中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA-137调节线粒体自噬的机制及其在帕金森病中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

染色质重塑基因ARID2在肝癌中的临床意义及相关功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

CCND1基因rs9344位点多态性影响汉族女性宫颈癌易感性的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

MiR-9调控UHRF1参与肺癌发生机制

国家自然科学基金

0+阅读 · 2012年12月31日

Doublecortin的动态表达在骨折愈合中的作用与调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白H2AX乙酰化调控其酪氨酸Y142磷酸化促进肺癌细胞放射敏感性的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年6月8日

A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk

Arxiv

0+阅读 · 2022年6月7日

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

Arxiv

0+阅读 · 2022年6月6日

Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

Arxiv

0+阅读 · 2022年6月6日

Pessimistic Off-Policy Optimization for Learning to Rank

Arxiv

0+阅读 · 2022年6月6日

Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models

Arxiv

0+阅读 · 2022年6月6日

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

Arxiv

0+阅读 · 2022年6月6日

Rotation to Sparse Loadings using $L^p$ Losses and Related Inference Problems

Arxiv

0+阅读 · 2022年6月5日

Estimation of Over-parameterized Models via Fitting to Future Observations

Arxiv

0+阅读 · 2022年6月3日

Linear Algorithms for Nonparametric Multiclass Probability Estimation

Arxiv

0+阅读 · 2022年6月3日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

54+阅读 · 2020年9月7日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】移动计算摄影的神经场表示

大语言模型遇见法律人工智能：综述

【ICCV2025】InfGen：一种分辨率无关的可扩展图像合成范式

美军用无人地面战车发展：现代战争中超越弹药的多元应用

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

【论文推荐】最新六篇强化学习相关论文—Sublinear、机器阅读理解、加速强化学习、对抗性奖励学习、人机交互

专知

17+阅读 · 2018年4月28日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Model-Based Reinforcement Learning Is Minimax-Optimal for Offline Zero-Sum Markov Games

Arxiv

0+阅读 · 2022年6月8日

A Simple and Optimal Policy Design for Online Learning with Safety against Heavy-tailed Risk

Arxiv

0+阅读 · 2022年6月7日

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

Arxiv

0+阅读 · 2022年6月6日

Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

Arxiv

0+阅读 · 2022年6月6日

Pessimistic Off-Policy Optimization for Learning to Rank

Arxiv

0+阅读 · 2022年6月6日

Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models

Arxiv

0+阅读 · 2022年6月6日

Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs

Arxiv

0+阅读 · 2022年6月6日

Rotation to Sparse Loadings using $L^p$ Losses and Related Inference Problems

Arxiv

0+阅读 · 2022年6月5日

Estimation of Over-parameterized Models via Fitting to Future Observations

Arxiv

0+阅读 · 2022年6月3日

Linear Algorithms for Nonparametric Multiclass Probability Estimation

Arxiv

0+阅读 · 2022年6月3日

相关基金

Waardenburg综合征的拷贝数变异检测及其致病机制的研究

国家自然科学基金

0+阅读 · 2015年12月31日

烟草致癌物NNK诱发长链非编码RNA基因突变在肺癌发生中的作用

国家自然科学基金

0+阅读 · 2014年12月31日

抑制Notch信号通路对成年脊髓神经干/祖细胞的调控及其在脊髓损伤修复中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

microRNA-137调节线粒体自噬的机制及其在帕金森病中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

流形上的Bakry-Emery曲率，泛函不等式和热核分析

国家自然科学基金

0+阅读 · 2012年12月31日

染色质重塑基因ARID2在肝癌中的临床意义及相关功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

CCND1基因rs9344位点多态性影响汉族女性宫颈癌易感性的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

MiR-9调控UHRF1参与肺癌发生机制

国家自然科学基金

0+阅读 · 2012年12月31日

Doublecortin的动态表达在骨折愈合中的作用与调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

组蛋白H2AX乙酰化调控其酪氨酸Y142磷酸化促进肺癌细胞放射敏感性的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员