在协调搜索中利用学习政策 (Exploiting Learned Policies in Focal Search)

Recent machine-learning approaches to deterministic search and domain-independent planning employ policy learning to speed up search. Unfortunately, when attempting to solve a search problem by successively applying a policy, no guarantees can be given on solution quality. The problem of how to effectively use a learned policy within a bounded-suboptimal search algorithm remains largely as an open question. In this paper, we propose various ways in which such policies can be integrated into Focal Search, assuming that the policy is a neural network classifier. Furthermore, we provide mathematical foundations for some of the resulting algorithms. To evaluate the resulting algorithms over a number of policies with varying accuracy, we use synthetic policies which can be generated for a target accuracy for problems where the search space can be held in memory. We evaluate our focal search variants over three benchmark domains using our synthetic approach, and on the 15-puzzle using a neural network learned using 1.5 million examples. We observe that \emph{Discrepancy Focal Search}, which we show expands the node which maximizes an approximation of the probability that its corresponding path is a prefix of an optimal path, obtains, in general, the best results in terms of runtime and solution quality.

翻译：最近确定性搜索和视域独立的规划的机械学习方法采用政策学习来加快搜索。不幸的是,在试图通过连续适用一项政策来解决搜索问题时,无法对解决方案的质量提供保证。如何在封闭的亚优最佳搜索算法中有效地使用所学政策的问题在很大程度上仍是一个未决问题。在本文件中,我们建议了将此类政策纳入焦点搜索的各种方法,假设该政策是一个神经网络分类器。此外,我们为由此产生的一些算法提供了数学基础。为了评估一系列政策产生的算法的准确性各不相同,我们使用合成政策来为搜索空间可以记忆的问题设定目标准确性。我们使用合成方法对三个基准域的焦点搜索变量进行评估,并利用利用150万个实例所学的神经网络对15个目标进行评估。我们观察到了\ emph{ 差异网络搜索},我们展示了扩大节点,以尽可能扩大其相应路径的概率的近似值,即其最佳路径是最佳路径的预置,在总体上获得最佳质量和最佳路径上的最佳结果。