一种用于深度优先启发式搜索中批量启发式操作的并行CPU-GPU框架 (A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic Search)

The rapid advancement of GPU technology has unlocked powerful parallel processing capabilities, creating new opportunities to enhance classic search algorithms. This hardware has been exploited in best-first search algorithms with neural network-based heuristics by creating batched versions of A* and Weighted A* that delay heuristic evaluation until sufficiently many states can be evaluated in parallel on the GPU. But, research has not addressed how depth-first algorithms like IDA* or Budgeted Tree Search (BTS) can have their heuristic computations batched. This is more complicated in a tree search, because progress in the search tree is blocked until heuristic evaluations are complete. In this paper we show that GPU parallelization of heuristics can be effectively performed when the tree search is parallelized on the CPU while heuristic evaluations are parallelized on the GPU. We develop a parallelized cost-bounded depth-first search (CB-DFS) framework that can be applied to both IDA* and BTS, significantly improving their performance. We demonstrate the strength of the approach on the 3x3 Rubik's Cube and the 4x4 sliding tile puzzle (STP) with both classifier-based and regression-based heuristics.

翻译：GPU技术的快速发展释放了强大的并行处理能力，为增强经典搜索算法创造了新的机遇。通过创建批处理版本的A*和加权A*算法，该硬件已在基于神经网络启发式的最佳优先搜索算法中得到利用，这些算法将启发式评估延迟至足够多的状态能够在GPU上并行评估时进行。然而，研究尚未解决如何对IDA*或预算树搜索（BTS）等深度优先算法进行启发式计算的批处理。在树搜索中，这一问题更为复杂，因为搜索树的进展在启发式评估完成前会被阻塞。本文展示了当树搜索在CPU上并行化而启发式评估在GPU上并行化时，可以有效地实现启发式的GPU并行化。我们开发了一种并行化成本有界深度优先搜索（CB-DFS）框架，该框架可应用于IDA*和BTS，显著提升了它们的性能。我们通过在3x3魔方和4x4滑块拼图（STP）上使用基于分类器和基于回归的启发式，验证了该方法的优势。