超级决策Transformer用于高效的在线策略适应 (Hyper-Decision Transformer for Efficient Online Policy Adaptation)

Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. To achieve such a goal, we propose to augment the base DT with an adaptation module, whose parameters are initialized by a hyper-network. When encountering unseen tasks, the hyper-network takes a handful of demonstrations as inputs and initializes the adaptation module accordingly. This initialization enables HDT to efficiently adapt to novel tasks by only fine-tuning the adaptation module. We validate HDT's generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin.

翻译：决策Transformer（DT）在离线强化学习设置中表现出了强大的性能，但是快速适应未见过的新任务仍然具有挑战性。为了解决这一挑战，我们提出了一个新的框架，称为超级决策Transformer（HDT），它可以以数据和参数高效的方式从少量演示中推广到新任务。为了实现这样的目标，我们提出了在基本DT上增加一个适应模块的方法，其参数由超网络初始化。在遇到未见过的任务时，超网络将少量演示作为输入并相应地初始化适应模块。此初始化使得HDT能够通过仅微调适应模块来高效地适应新任务。我们在物体操作任务中验证了HDT的泛化能力。我们发现，仅使用一次专家演示并微调0.5％的DT参数，HDT比微调整个DT模型更快地适应未见过的任务。最后，我们探索了更具挑战性的情况，其中专家行动不可用，并且我们展示了HDT在任务成功率方面的表现优于最先进基线模型。