学习信任：序列决策中基于贝叶斯方法对可变建议者可靠性的自适应 (Learning to Trust: Bayesian Adaptation to Varying Suggester Reliability in Sequential Decision Making)

Autonomous agents operating in sequential decision-making tasks under uncertainty can benefit from external action suggestions, which provide valuable guidance but inherently vary in reliability. Existing methods for incorporating such advice typically assume static and known suggester quality parameters, limiting practical deployment. We introduce a framework that dynamically learns and adapts to varying suggester reliability in partially observable environments. First, we integrate suggester quality directly into the agent's belief representation, enabling agents to infer and adjust their reliance on suggestions through Bayesian inference over suggester types. Second, we introduce an explicit ``ask'' action allowing agents to strategically request suggestions at critical moments, balancing informational gains against acquisition costs. Experimental evaluation demonstrates robust performance across varying suggester qualities, adaptation to changing reliability, and strategic management of suggestion requests. This work provides a foundation for adaptive human-agent collaboration by addressing suggestion uncertainty in uncertain environments.

翻译：在不确定性下执行序列决策任务的自主智能体可从外部动作建议中获益，这些建议提供了有价值的指导，但其可靠性本质上存在差异。现有整合此类建议的方法通常假设建议者质量参数是静态且已知的，这限制了实际部署。我们提出了一个框架，能够在部分可观测环境中动态学习并适应变化的建议者可靠性。首先，我们将建议者质量直接集成到智能体的信念表示中，使其能够通过对建议者类型的贝叶斯推断来推断并调整对建议的依赖程度。其次，我们引入了一个显式的“询问”动作，允许智能体在关键时刻策略性地请求建议，以平衡信息增益与获取成本。实验评估表明，该方法在不同建议者质量下均表现出鲁棒性能，能够适应变化的可靠性，并对建议请求进行策略性管理。这项工作通过解决不确定环境中的建议不确定性，为自适应人机协作奠定了基础。