Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2.5 times faster while maintaining a similar winning rate. Also, under the same average simulation count, our method can achieve a 61% winning rate against the original program.
翻译:蒙特卡洛树搜索( MCTS) 在许多领域取得了最新成果, 如 Go 和 Atari 游戏与深神经网络( DNNS ) 。 当执行更多的模拟时, MCTS 可以实现更高的性能, 但也需要大量的 CPU 和 GPU 资源。 但是, 并非所有州都需要很长的搜索时间才能确定代理方能够找到的最佳动作。 例如, 在 19x19 Go 和 NoGo 中, 我们发现, 在超过半数的州, DNNE 所预测的最佳动作即使在搜索了2分钟后仍然没有变化。 这意味着如果我们在对当前搜索结果有信心时能够早一点停止搜索, 大量的资源可以节省。 在本文中, 我们建议通过预测当前搜索状态的不确定性来实现这一目标, 并使用结果来决定我们是否应该停止搜索。 我们的算法称为动态模拟 MCTS ( DS- MCTS), 我们可以加快阿尔法泽罗 2.5 训练的NOTER 2.5 速度, 同时保持类似的赢率 。 此外, 根据同样的平均模拟算算算算算, 我们的方法可以达到 61% 。