逻辑谜题中逐步解释的偏好引导 (Preference Elicitation for Step-Wise Explanations in Logic Puzzles)

Step-wise explanations can explain logic puzzles and other satisfaction problems by showing how to derive decisions step by step. Each step consists of a set of constraints that derive an assignment to one or more decision variables. However, many candidate explanation steps exist, with different sets of constraints and different decisions they derive. To identify the most comprehensible one, a user-defined objective function is required to quantify the quality of each step. However, defining a good objective function is challenging. Here, interactive preference elicitation methods from the wider machine learning community can offer a way to learn user preferences from pairwise comparisons. We investigate the feasibility of this approach for step-wise explanations and address several limitations that distinguish it from elicitation for standard combinatorial problems. First, because the explanation quality is measured using multiple sub-objectives that can vary a lot in scale, we propose two dynamic normalization techniques to rescale these features and stabilize the learning process. We also observed that many generated comparisons involve similar explanations. For this reason, we introduce MACHOP (Multi-Armed CHOice Perceptron), a novel query generation strategy that integrates non-domination constraints with upper confidence bound-based diversification. We evaluate the elicitation techniques on Sudokus and Logic-Grid puzzles using artificial users, and validate them with a real-user evaluation. In both settings, MACHOP consistently produces higher-quality explanations than the standard approach.

翻译：逐步解释能够通过展示如何逐步推导决策，来解释逻辑谜题及其他满足性问题。每一步包含一组约束条件，这些约束推导出一个或多个决策变量的赋值。然而，存在许多候选解释步骤，它们具有不同的约束集和不同的推导决策。为了识别最易于理解的步骤，需要用户定义的目标函数来量化每一步的质量。然而，定义一个良好的目标函数具有挑战性。在此，来自更广泛机器学习社区的交互式偏好引导方法提供了一种从成对比较中学习用户偏好的途径。我们研究了该方法在逐步解释中的可行性，并解决了其与标准组合问题引导相比的若干局限性。首先，由于解释质量使用多个子目标进行度量，这些子目标在尺度上可能存在较大差异，我们提出了两种动态归一化技术来重新缩放这些特征并稳定学习过程。我们还观察到，许多生成的比较涉及相似的解释。因此，我们引入了MACHOP（多臂选择感知机），这是一种新颖的查询生成策略，它将非支配约束与基于上置信界的多样化方法相结合。我们使用模拟用户在数独和逻辑网格谜题上评估了这些引导技术，并通过真实用户评估进行了验证。在这两种设置中，MACHOP始终比标准方法生成更高质量的解释。