To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.
翻译:为实现与人类的协作,机器人必须推断那些通常模糊、难以明确表达或不属于固定集合的目标。现有方法将推断限制在预定义的目标集合内,仅依赖观察到的行动,或完全依赖于显式指令,导致其在真实交互中表现脆弱。我们提出用于目标预测的BALI(双向行动-语言推断)方法,该方法在滚动时域规划树中整合自然语言偏好与观察到的人类行动。BALI结合来自人类的语言和行动线索,仅在预期从回答中获得的信息增益超过中断成本时提出澄清性问题,并选择与推断目标一致的支持性行动。我们在协作烹饪任务中评估该方法,其中目标对机器人而言可能是新颖且无界的。与基线方法相比,BALI实现了更稳定的目标预测并显著减少了错误。