受约束 MDP 的复杂度环 (Near-Optimal Sample Complexity Bounds for Constrained MDPs) - 专知论文

会员服务 ·

0

样本复杂度 · Minimax · Learning · 约束 · 样本 ·

2022 年 6 月 13 日

Near-Optimal Sample Complexity Bounds for Constrained MDPs

翻译：受约束 MDP 的复杂度环

Sharan Vaswani,Lin F. Yang,Csaba Szepesvári

In contrast to the advances in characterizing the sample complexity for solving Markov decision processes (MDPs), the optimal statistical complexity for solving constrained MDPs (CMDPs) remains unknown. We resolve this question by providing minimax upper and lower bounds on the sample complexity for learning near-optimal policies in a discounted CMDP with access to a generative model (simulator). In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint. For (i), we prove that our algorithm returns an $\epsilon$-optimal policy with probability $1 - \delta$, by making $\tilde{O}\left(\frac{S A \log(1/\delta)}{(1 - \gamma)^3 \epsilon^2}\right)$ queries to the generative model, thus matching the sample-complexity for unconstrained MDPs. For (ii), we show that the algorithm's sample complexity is upper-bounded by $\tilde{O} \left(\frac{S A \, \log(1/\delta)}{(1 - \gamma)^5 \, \epsilon^2 \zeta^2} \right)$ where $\zeta$ is the problem-dependent Slater constant that characterizes the size of the feasible region. Finally, we prove a matching lower-bound for the strict feasibility setting, thus obtaining the first near minimax optimal bounds for discounted CMDPs. Our results show that learning CMDPs is as easy as MDPs when small constraint violations are allowed, but inherently more difficult when we demand zero constraint violation.

翻译：与解决 Markov 决策程序( MDPs) 的样本复杂性特征相比, 解决受限制的 MDP (CMDPs) 的最佳统计复杂性仍然未知。我们解决这个问题的方法是, 在折扣的 CMDP 中提供用于学习近最佳政策的样本复杂性上下界限, 使用基因化模型( 模拟器) 。特别是, 我们设计一种基于模型的算法, 处理两种设置 :( 一) 放松可行性, 允许小节制违约, 和 (二) 严格的可行性, 需要输出政策来满足制约。 (一) 我们证明我们的算法返回了 $\ epslon$- 最佳政策, 概率为 1 -\ delta$; 我们证明, $\\\\\\ left (ferc{ S\ 小型 =log (1/ deltata) { (1- gammamamamamamamama) 中, 当我们无法进行精度测试时, 最优的MDPs (r2\\\ main remax remax) max a res remax relial res remax max max max s max lax s lax lax s lax the lax lax lax lax lax s lax s lax s lax s lax s lax s lax s lax lax lax lax lax s lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax) lax lax lads lax lad lad lax lax lax lax lax lads lax lax lax lads lax lax lax lax lads lax lads lads lads lax) lax)

0

相关内容

样本复杂度

样本复杂度

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

基于可重构petri网与贝叶斯推理的分布式发电系统协调优化控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Petri网的自动制造系统分布式控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Dicer在先天性巨结肠发病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型Re(I)配合物磷光材料的设计、合成及其光电性能研究

国家自然科学基金

1+阅读 · 2012年12月31日

非平稳噪声条件下软测量系统量子随机滤波方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kirchhoff型拟线性Schrodinger方程及其耦合系统的非光滑变分方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

牛卵巢卵泡颗粒细胞CART相互作用蛋白的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于符号-数值混合计算的多项式优化问题的准确验证

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

On Lower Bounds of Approximating Parameterized $k$-Clique

Arxiv

0+阅读 · 2022年8月3日

UniRank: Unimodal Bandit Algorithm for Online Ranking

Arxiv

0+阅读 · 2022年8月2日

Generalization Bounds in the Predict-then-Optimize Framework

Arxiv

0+阅读 · 2022年8月2日

Highly Efficient Estimators with High Breakdown Point for Linear Models with Structured Covariance Matrices

Arxiv

0+阅读 · 2022年8月1日

A Novel Optimized Decomposition Method for Smoluchowski's Aggregation Equation

Arxiv

0+阅读 · 2022年8月1日

Low-complexity Approximate Convolutional Neural Networks

Arxiv

0+阅读 · 2022年7月29日

Estimation of smooth functionals of covariance operators: jackknife bias reduction and bounds in terms of effective rank

Arxiv

0+阅读 · 2022年7月29日

Asymptotic Consistency for Nonconvex Risk-Averse Stochastic Optimization with Infinite Dimensional Decision Spaces

Arxiv

0+阅读 · 2022年7月29日

Treatment Effect Estimation with Unobserved and Heterogeneous Confounding Variables

Arxiv

0+阅读 · 2022年7月29日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

样本复杂度

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

【超赞的#C++#速查&信息图】“hacking c++ - Cheat Sheets & Infographics”

专知会员服务

30+阅读 · 2022年3月8日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On Lower Bounds of Approximating Parameterized $k$-Clique

Arxiv

0+阅读 · 2022年8月3日

UniRank: Unimodal Bandit Algorithm for Online Ranking

Arxiv

0+阅读 · 2022年8月2日

Generalization Bounds in the Predict-then-Optimize Framework

Arxiv

0+阅读 · 2022年8月2日

Highly Efficient Estimators with High Breakdown Point for Linear Models with Structured Covariance Matrices

Arxiv

0+阅读 · 2022年8月1日

A Novel Optimized Decomposition Method for Smoluchowski's Aggregation Equation

Arxiv

0+阅读 · 2022年8月1日

Low-complexity Approximate Convolutional Neural Networks

Arxiv

0+阅读 · 2022年7月29日

Estimation of smooth functionals of covariance operators: jackknife bias reduction and bounds in terms of effective rank

Arxiv

0+阅读 · 2022年7月29日

Asymptotic Consistency for Nonconvex Risk-Averse Stochastic Optimization with Infinite Dimensional Decision Spaces

Arxiv

0+阅读 · 2022年7月29日

Treatment Effect Estimation with Unobserved and Heterogeneous Confounding Variables

Arxiv

0+阅读 · 2022年7月29日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

基于可重构petri网与贝叶斯推理的分布式发电系统协调优化控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于Petri网的自动制造系统分布式控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Dicer在先天性巨结肠发病中的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

新型Re(I)配合物磷光材料的设计、合成及其光电性能研究

国家自然科学基金

1+阅读 · 2012年12月31日

非平稳噪声条件下软测量系统量子随机滤波方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Kirchhoff型拟线性Schrodinger方程及其耦合系统的非光滑变分方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

牛卵巢卵泡颗粒细胞CART相互作用蛋白的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于符号-数值混合计算的多项式优化问题的准确验证

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员