序列裁定问题零点援助 (Zero-Shot Assistance in Sequential Decision Problems)

We consider the problem of creating assistants that can help agents solve new sequential decision problems, assuming the agent is not able to specify the reward function explicitly to the assistant. Instead of acting in place of the agent as in current automation-based approaches, we give the assistant an advisory role and keep the agent in the loop as the main decision maker. The difficulty is that we must account for potential biases of the agent which may cause it to seemingly irrationally reject advice. To do this we introduce a novel formalization of assistance that models these biases, allowing the assistant to infer and adapt to them. We then introduce a new method for planning the assistant's actions which can scale to large decision making problems. We show experimentally that our approach adapts to these agent biases, and results in higher cumulative reward for the agent than automation-based alternatives. Lastly, we show that an approach combining advice and automation outperforms advice alone at the cost of losing some safety guarantees.

翻译：我们考虑设立助理,帮助代理商解决新的连续决策问题的问题,假设代理商无法明确指定助理的奖励职能。我们不但没有像目前自动化方法那样代替代理商行事,而是赋予助理顾问以咨询作用,并让代理商作为主要决策者留在循环圈中。困难在于我们必须说明代理商潜在的偏见,这种偏见可能导致其似乎毫无道理地拒绝建议。要做到这一点,我们引入一种新的援助正规化,以这些偏见为模型,允许助理员推断和适应这些偏见。然后我们引入一种新的方法来规划助理的行动,这种方法可以扩大到大规模决策问题。我们实验性地表明,我们的做法适应了这些代理商的偏见,并且给代理商带来的累积性报酬高于自动化替代方法。最后,我们表明,一种将咨询和自动化相结合的方法单以失去某些安全保障为代价,单凭着建议。