测试时转导的程序合成 (Program Synthesis via Test-Time Transduction)

We introduce transductive program synthesis, a new formulation of the program synthesis task that explicitly leverages test inputs during synthesis. While prior approaches to program synthesis--whether based on natural language descriptions or input-output examples--typically aim to generalize from training examples, they often struggle with robustness, especially in real-world settings where training examples are limited and test inputs involve various edge cases. To address this, we propose a novel framework that improves robustness by treating synthesis as an active learning over a finite hypothesis class defined by programs' outputs. We use an LLM to predict outputs for selected test inputs and eliminate inconsistent hypotheses, where the inputs are chosen via a greedy maximin algorithm to minimize the number of LLM queries required. We evaluate our approach on four benchmarks: Playgol, MBPP+, 1D-ARC, and programmatic world modeling on MiniGrid. We demonstrate that our method significantly improves program synthesis in both accuracy and efficiency. We release our code at https://github.com/klee972/SYNTRA.

翻译：我们提出了转导式程序合成，这是一种新的程序合成任务形式，它明确地在合成过程中利用测试输入。虽然以往的程序合成方法——无论是基于自然语言描述还是输入输出示例——通常旨在从训练示例中泛化，但它们常常在鲁棒性方面遇到困难，尤其是在现实场景中，训练示例有限且测试输入涉及各种边缘情况。为了解决这个问题，我们提出了一个新颖的框架，通过将合成视为在由程序输出定义的有限假设类上进行主动学习来提高鲁棒性。我们使用一个LLM来预测选定测试输入的输出，并消除不一致的假设，其中输入的选择通过贪婪的最大最小算法来最小化所需的LLM查询次数。我们在四个基准测试上评估了我们的方法：Playgol、MBPP+、1D-ARC以及MiniGrid上的程序化世界建模。我们证明了我们的方法在准确性和效率上都显著改善了程序合成。我们在 https://github.com/klee972/SYNTRA 发布了我们的代码。