A* search is an informed search algorithm that uses a heuristic function to guide the order in which nodes are expanded. Since the computation required to expand a node and compute the heuristic values for all of its generated children grows linearly with the size of the action space, A* search can become impractical for problems with large action spaces. This computational burden becomes even more apparent when heuristic functions are learned by general, but computationally expensive, deep neural networks. To address this problem, we introduce DeepCubeAQ, a deep reinforcement learning and search algorithm that builds on the DeepCubeA algorithm and deep Q-networks. DeepCubeAQ learns a heuristic function that, with a single forward pass through a deep neural network, computes the sum of the transition cost and the heuristic value of all of the children of a node without explicitly generating any of the children, eliminating the need for node expansions. DeepCubeAQ then uses a novel variant of A* search, called AQ* search, that uses the deep Q-network to guide search. We use DeepCubeAQ to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and show that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time when performing AQ* search and that AQ* search is orders of magnitude faster than A* search.
翻译:* 搜索是一种知情的搜索算法,它使用一种休眠功能来指导节点扩展的顺序。由于扩展节点和计算其所有生成的孩子的休休养值所需的计算方法随着行动空间的大小而线性增长,A* 搜索可能变得不切实际,因为行动空间大的问题。当通过一般但计算成本昂贵的深神经网络来学习休养功能时,这种计算负担就变得更加明显。为了解决这个问题,我们引入了深休贝AQ,这是一种深度强化学习和搜索算法,它建立在深休贝A算法和深地Q网络之上。深休贝AQ学会学会一种休养功能,这种功能通过深思内线网络的单一前方传递,可以计算过渡成本和所有节点孩子的超额价值,而没有明确地产生任何孩子,消除无节点扩张的需要。为了解决这个问题,我们引入了深休贝AQQ, 这是一种更快速的搜索和搜索变种,它利用深休贝AQ网络来指导搜索。我们使用深休贝AQ 学习一种超深的休贝A-Q,在进行大规模搜索时,在进行大规模的搜索时,A-decubeA-QA-QA-C显示,在18的计算中增加了空间搜索时,可以增加一个可操作,在18的计算中增加一个进行。