学习到停止:动态模拟蒙特卡洛树搜索 (Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search) - 专知论文

会员服务 ·

0

动力学模拟 · 可辨认的 · Performer · 蒙特卡洛树搜索 · state-of-the-art ·

2020 年 12 月 14 日

Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search

翻译：学习到停止:动态模拟蒙特卡洛树搜索

Li-Cheng Lan,Meng-Yu Tsai,Ti-Rong Wu,I-Chen Wu,Cho-Jui Hsieh

from arxiv, Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2.5 times faster while maintaining a similar winning rate. Also, under the same average simulation count, our method can achieve a 61% winning rate against the original program.

翻译：蒙特卡洛树搜索( MCTS) 在许多领域取得了最新成果, 如 Go 和 Atari 游戏与深神经网络( DNNS ) 。当执行更多的模拟时, MCTS 可以实现更高的性能, 但也需要大量的 CPU 和 GPU 资源。但是, 并非所有州都需要很长的搜索时间才能确定代理方能够找到的最佳动作。例如, 在 19x19 Go 和 NoGo 中, 我们发现, 在超过半数的州, DNNE 所预测的最佳动作即使在搜索了2分钟后仍然没有变化。这意味着如果我们在对当前搜索结果有信心时能够早一点停止搜索, 大量的资源可以节省。在本文中, 我们建议通过预测当前搜索状态的不确定性来实现这一目标, 并使用结果来决定我们是否应该停止搜索。我们的算法称为动态模拟 MCTS ( DS- MCTS), 我们可以加快阿尔法泽罗 2.5 训练的NOTER 2.5 速度, 同时保持类似的赢率。此外, 根据同样的平均模拟算算算算算, 我们的方法可以达到 61% 。

0

相关内容

动力学模拟

动力学模拟

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日

【CMU】最新深度学习课程， Introduction to Deep Learning

【CMU】最新深度学习课程， Introduction to Deep Learning

专知会员服务

37+阅读 · 2020年9月12日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

183+阅读 · 2020年2月1日

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

专知会员服务

66+阅读 · 2019年12月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

158+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【下载】深度学习与围棋实战书籍《Deep Learning and the Game of Go》

【下载】深度学习与围棋实战书籍《Deep Learning and the Game of Go》

专知

20+阅读 · 2017年12月13日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

Multilevel Monte Carlo learning

Arxiv

0+阅读 · 2021年2月17日

Learning to search efficiently for causally near-optimal treatments

Arxiv

1+阅读 · 2021年2月17日

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Arxiv

1+阅读 · 2021年2月17日

Searching for Search Errors in Neural Morphological Inflection

Arxiv

0+阅读 · 2021年2月16日

Convex Regularization in Monte-Carlo Tree Search

Arxiv

0+阅读 · 2021年2月16日

Training Larger Networks for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年2月16日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月6日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

VIP会员

文章信息

相关主题

动力学模拟

蒙特卡洛树搜索

state-of-the-art

相关VIP内容

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日

【CMU】最新深度学习课程， Introduction to Deep Learning

【CMU】最新深度学习课程， Introduction to Deep Learning

专知会员服务

37+阅读 · 2020年9月12日

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

【伯克利-Ke Li】学习优化，74页ppt，Learning to Optimize

专知会员服务

41+阅读 · 2020年7月23日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

183+阅读 · 2020年2月1日

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

论深度学习的信息瓶颈理论（On the information bottleneck theory of deep learning）

专知会员服务

66+阅读 · 2019年12月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

158+阅读 · 2019年10月12日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】让语言模型更接近人类学习者

225页《在线学习简明介绍》书册

无人艇集群路径规划研究综述: 深度强化学习

【WWW2025教程】人工智能在复杂网络中的应用：潜力、方法与应用

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

42+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

RL 真经

CreateAMind

5+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

【下载】深度学习与围棋实战书籍《Deep Learning and the Game of Go》

【下载】深度学习与围棋实战书籍《Deep Learning and the Game of Go》

专知

20+阅读 · 2017年12月13日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Andrew NG的新书《Machine Learning Yearning》

Andrew NG的新书《Machine Learning Yearning》

我爱机器学习

11+阅读 · 2016年12月7日

相关论文

Multilevel Monte Carlo learning

Arxiv

0+阅读 · 2021年2月17日

Learning to search efficiently for causally near-optimal treatments

Arxiv

1+阅读 · 2021年2月17日

RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents

Arxiv

1+阅读 · 2021年2月17日

Searching for Search Errors in Neural Morphological Inflection

Arxiv

0+阅读 · 2021年2月16日

Convex Regularization in Monte-Carlo Tree Search

Arxiv

0+阅读 · 2021年2月16日

Training Larger Networks for Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年2月16日

Self-correcting Q-Learning

Arxiv

11+阅读 · 2020年12月2日

Logically-Constrained Reinforcement Learning

Logically-Constrained Reinforcement Learning

Arxiv

3+阅读 · 2018年12月6日

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Arxiv

5+阅读 · 2018年9月6日

Accelerated Reinforcement Learning

Arxiv

6+阅读 · 2018年4月24日

微信扫码咨询专知VIP会员