面向因果感知强化学习的以对象为中心的世界模型 (Object-Centric World Models for Causality-Aware Reinforcement Learning)

World models have been developed to support sample-efficient deep reinforcement learning agents. However, it remains challenging for world models to accurately replicate environments that are high-dimensional, non-stationary, and composed of multiple objects with rich interactions since most world models learn holistic representations of all environmental components. By contrast, humans perceive the environment by decomposing it into discrete objects, facilitating efficient decision-making. Motivated by this insight, we propose \emph{Slot Transformer Imagination with CAusality-aware reinforcement learning} (STICA), a unified framework in which object-centric Transformers serve as the world model and causality-aware policy and value networks. STICA represents each observation as a set of object-centric tokens, together with tokens for the agent action and the resulting reward, enabling the world model to predict token-level dynamics and interactions. The policy and value networks then estimate token-level cause--effect relations and use them in the attention layers, yielding causality-guided decision-making. Experiments on object-rich benchmarks demonstrate that STICA consistently outperforms state-of-the-art agents in both sample efficiency and final performance.

翻译：世界模型已被开发用于支持样本高效的深度强化学习智能体。然而，由于大多数世界模型学习所有环境组件的整体表示，准确复现高维、非平稳且由多个具有丰富交互的对象构成的环境仍然具有挑战性。相比之下，人类通过将环境分解为离散对象来感知环境，从而促进高效决策。受此启发，我们提出了《Slot Transformer Imagination with CAusality-aware reinforcement learning》（STICA），这是一个统一框架，其中以对象为中心的Transformer作为世界模型以及因果感知的策略与价值网络。STICA将每个观测表示为一组以对象为中心的标记，连同表示智能体动作及所得奖励的标记，使世界模型能够预测标记级动态与交互。随后，策略与价值网络估计标记级因果关系，并将其用于注意力层中，实现因果引导的决策。在对象丰富的基准测试上的实验表明，STICA在样本效率和最终性能上均持续优于最先进的智能体。