MCTS中静态和动态计算值 (Static and Dynamic Values of Computation in MCTS) - 专知论文

会员服务 ·

0

Performer · 蒙特卡洛树搜索方法 · 优化器 · state-of-the-art · 统计量 ·

2020 年 11 月 19 日

Static and Dynamic Values of Computation in MCTS

翻译：MCTS中静态和动态计算值

Eren Sezener,Peter Dayan

from arxiv, Presented in UAI 2020

Monte-Carlo Tree Search (MCTS) is one of the most-widely used methods for planning, and has powered many recent advances in artificial intelligence. In MCTS, one typically performs computations (i.e., simulations) to collect statistics about the possible future consequences of actions, and then chooses accordingly. Many popular MCTS methods such as UCT and its variants decide which computations to perform by trading-off exploration and exploitation. In this work, we take a more direct approach, and explicitly quantify the value of a computation based on its expected impact on the quality of the action eventually chosen. Our approach goes beyond the "myopic" limitations of existing computation-value-based methods in two senses: (I) we are able to account for the impact of non-immediate (ie, future) computations (II) on non-immediate actions. We show that policies that greedily optimize computation values are optimal under certain assumptions and obtain results that are competitive with the state-of-the-art.

翻译：Monte-Carlo树搜索(MCTS)是最广泛使用的规划方法之一,它使人造智能最近取得的许多进步成为动力。在MCTS中,人们通常进行计算(即模拟),收集行动今后可能产生的后果的统计数据,然后作出相应的选择。许多流行的 MCT方法,如UCT及其变体,决定通过交易勘探和开发进行哪些计算。在这项工作中,我们采取更直接的方法,根据对最终选择的行动的质量的预期影响,明确量化计算值。我们的方法超越了现有计算价值方法在两种意义上的“微小”限制:(I)我们有能力说明非中间(i,未来)计算(II)对非直接行动的影响。我们证明贪婪优化计算值的政策在某些假设下是最佳的,并获得与最新技术竞争的结果。

0

相关内容

Performer

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Applicability and Challenges of Deep Reinforcement Learning for Satellite Frequency Plan Design

Arxiv

0+阅读 · 2021年1月12日

Learning Augmented Index Policy for Optimal Service Placement at the Network Edge

Arxiv

0+阅读 · 2021年1月10日

Dynamic Weighted Matching with Heterogeneous Arrival and Departure Rates

Arxiv

0+阅读 · 2021年1月10日

A New Graph Parameter To Measure Linearity

Arxiv

0+阅读 · 2021年1月9日

An Optimization Framework for Power Infrastructure Planning

Arxiv

0+阅读 · 2021年1月9日

Dynamic Graph Collaborative Filtering

Arxiv

0+阅读 · 2021年1月8日

Auto-GNN: Neural Architecture Search of Graph Neural Networks

Auto-GNN: Neural Architecture Search of Graph Neural Networks

Arxiv

7+阅读 · 2019年9月10日

HAQ: Hardware-Aware Automated Quantization

HAQ: Hardware-Aware Automated Quantization

Arxiv

6+阅读 · 2018年11月21日

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月6日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

VIP会员

文章信息

相关主题

蒙特卡洛树搜索方法

state-of-the-art

相关VIP内容

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

中文版3500字 | 俄乌前线启示：自主系统、人工智能驱动的后勤保障与量子增强型国防安全重新定义军事战略

《基于二元优化与图学习的多智能体行动方案自动生成》

中文版4000字 | 人工智能如何重塑自适应兵棋推演与战略战备

万字长文《针对无人机航电系统的电子战网络攻击、对抗措施与现代防御策略综述》

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Applicability and Challenges of Deep Reinforcement Learning for Satellite Frequency Plan Design

Arxiv

0+阅读 · 2021年1月12日

Learning Augmented Index Policy for Optimal Service Placement at the Network Edge

Arxiv

0+阅读 · 2021年1月10日

Dynamic Weighted Matching with Heterogeneous Arrival and Departure Rates

Arxiv

0+阅读 · 2021年1月10日

A New Graph Parameter To Measure Linearity

Arxiv

0+阅读 · 2021年1月9日

An Optimization Framework for Power Infrastructure Planning

Arxiv

0+阅读 · 2021年1月9日

Dynamic Graph Collaborative Filtering

Arxiv

0+阅读 · 2021年1月8日

Auto-GNN: Neural Architecture Search of Graph Neural Networks

Auto-GNN: Neural Architecture Search of Graph Neural Networks

Arxiv

7+阅读 · 2019年9月10日

HAQ: Hardware-Aware Automated Quantization

HAQ: Hardware-Aware Automated Quantization

Arxiv

6+阅读 · 2018年11月21日

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月6日

Variational Bayesian Reinforcement Learning with Regret Bounds

Arxiv

3+阅读 · 2018年7月25日

微信扫码咨询专知VIP会员