在多边发展方案中优化风险条件值和风险反风险规划预期值 (Lexicographic Optimisation of Conditional Value at Risk and Expected Value for Risk-Averse Planning in MDPs) - 专知论文

会员服务 ·

0

总体代价 · 代价 · 优化器 · Processing（编程语言） · 约束 ·

2021 年 10 月 25 日

Lexicographic Optimisation of Conditional Value at Risk and Expected Value for Risk-Averse Planning in MDPs

翻译：在多边发展方案中优化风险条件值和风险反风险规划预期值

Marc Rigter,Paul Duckworth,Bruno Lacerda,Nick Hawes

Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. We formulate the lexicographic optimisation problem of minimising the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on three domains, including a road navigation domain based on real traffic data. Our experimental results demonstrate that our lexicographic approach attains improved expected cost while maintaining the optimal CVaR.

翻译：在Markov决策程序(MDPs)中,规划通常对预期成本有选择性,然而,优化预期并不考虑对MDP的任何特定运行而言,总成本可能高得令人无法接受;另一种办法是找到一种政策,对风险风险的有条件价值等风险反向目标有选择性。在这项工作中,我们首先表明,可以有多种政策获得最佳的CVaR。我们制定了尽量减少预期成本的地名录优化问题,但受成本总额CVaR最佳程度的限制。我们提出了这一问题的算法,并评估了我们在三个领域的做法,包括基于实际交通数据的公路导航领域。我们的实验结果表明,我们的地名录方法在保持最佳CVaR的同时,实现了预期成本的提高。

0

相关内容

总体代价

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《深度强化学习》教程62页ppt，麻省理工2021深度学习导论课程MIT6.S191课程

《深度强化学习》教程62页ppt，麻省理工2021深度学习导论课程MIT6.S191课程

专知会员服务

47+阅读 · 2021年3月8日

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

77+阅读 · 2021年1月23日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【斯坦福 Chris Manning 新课】CS224n：自然语言处理与深度学习，附课程PPT下载

【斯坦福 Chris Manning 新课】CS224n：自然语言处理与深度学习，附课程PPT下载

专知会员服务

75+阅读 · 2020年1月7日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Hardness and Approximation of Minimum Convex Partition

Arxiv

0+阅读 · 2021年12月21日

Sending Timely Status Updates through Channel with Random Delay via Online Learning

Arxiv

0+阅读 · 2021年12月20日

The Kolmogorov Superposition Theorem can Break the Curse of Dimensionality When Approximating High Dimensional Functions

Arxiv

0+阅读 · 2021年12月18日

Optimal discharge of patients from intensive care via a data-driven policy learning framework

Arxiv

0+阅读 · 2021年12月17日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Self-Improved Retrosynthetic Planning

Arxiv

3+阅读 · 2021年6月9日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Path Planning using Neural A* Search

Arxiv

5+阅读 · 2021年2月8日

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Arxiv

4+阅读 · 2018年7月19日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

《深度强化学习》教程62页ppt，麻省理工2021深度学习导论课程MIT6.S191课程

《深度强化学习》教程62页ppt，麻省理工2021深度学习导论课程MIT6.S191课程

专知会员服务

47+阅读 · 2021年3月8日

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

斯坦福最新《强化学习》2021课程，Emma Brunskill主讲，附PPT下载

专知会员服务

77+阅读 · 2021年1月23日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【斯坦福 Chris Manning 新课】CS224n：自然语言处理与深度学习，附课程PPT下载

【斯坦福 Chris Manning 新课】CS224n：自然语言处理与深度学习，附课程PPT下载

专知会员服务

75+阅读 · 2020年1月7日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《复合人工智能决策优势：面向军事行动的人类数字孪生智能体编队与群体建模》最新文献

中文版《整合蓝绿作战域：北约空陆一体化向多域作战演进》2025最新资料

演进中的空中力量指挥控制体系

《在轨空间目标多智能体检测的制导、导航与控制》195页

相关资讯

量化金融强化学习论文集合

量化金融强化学习论文集合

专知

14+阅读 · 2019年12月18日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Hardness and Approximation of Minimum Convex Partition

Arxiv

0+阅读 · 2021年12月21日

Sending Timely Status Updates through Channel with Random Delay via Online Learning

Arxiv

0+阅读 · 2021年12月20日

The Kolmogorov Superposition Theorem can Break the Curse of Dimensionality When Approximating High Dimensional Functions

Arxiv

0+阅读 · 2021年12月18日

Optimal discharge of patients from intensive care via a data-driven policy learning framework

Arxiv

0+阅读 · 2021年12月17日

Policy Gradient Bayesian Robust Optimization for Imitation Learning

Arxiv

5+阅读 · 2021年6月11日

Self-Improved Retrosynthetic Planning

Arxiv

3+阅读 · 2021年6月9日

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret

Arxiv

8+阅读 · 2021年4月22日

Path Planning using Neural A* Search

Arxiv

5+阅读 · 2021年2月8日

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Test-time augmentation with uncertainty estimation for deep learning-based medical image segmentation

Arxiv

4+阅读 · 2018年7月19日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员