基于无动作Transformer编码器-解码器的上下文表示用于元强化学习 (Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning)

Reinforcement learning (RL) enables robots to operate in uncertain environments, but standard approaches often struggle with poor generalization to unseen tasks. Context-adaptive meta reinforcement learning addresses these limitations by conditioning on the task representation, yet they mostly rely on complete action information in the experience making task inference tightly coupled to a specific policy. This paper introduces Context Representation via Action Free Transformer encoder decoder (CRAFT), a belief model that infers task representations solely from sequences of states and rewards. By removing the dependence on actions, CRAFT decouples task inference from policy optimization, supports modular training, and leverages amortized variational inference for scalable belief updates. Built on a transformer encoder decoder with rotary positional embeddings, the model captures long range temporal dependencies and robustly encodes both parametric and non-parametric task variations. Experiments on the MetaWorld ML-10 robotic manipulation benchmark show that CRAFT achieves faster adaptation, improved generalization, and more effective exploration compared to context adaptive meta--RL baselines. These findings highlight the potential of action-free inference as a foundation for scalable RL in robotic control.

翻译：强化学习（RL）使机器人能够在不确定环境中运行，但标准方法通常难以泛化至未见任务。上下文自适应元强化学习通过任务表示的条件化来解决这些限制，然而它们大多依赖经验中的完整动作信息，导致任务推断与特定策略紧密耦合。本文提出基于无动作Transformer编码器-解码器的上下文表示（CRAFT），这是一种仅从状态和奖励序列推断任务表示的信念模型。通过消除对动作的依赖，CRAFT将任务推断与策略优化解耦，支持模块化训练，并利用摊销变分推断实现可扩展的信念更新。该模型基于带有旋转位置编码的Transformer编码器-解码器构建，能够捕捉长程时间依赖性，并稳健编码参数化和非参数化的任务变化。在MetaWorld ML-10机器人操作基准测试中的实验表明，与上下文自适应元强化学习基线相比，CRAFT实现了更快的适应能力、改进的泛化性能以及更有效的探索。这些发现凸显了无动作推断作为机器人控制中可扩展强化学习基础的潜力。