Humans leverage rich internal models of the world to reason about the future, imagine counterfactuals, and adapt flexibly to new situations. In Reinforcement Learning (RL), world models aim to capture how the environment evolves in response to the agent's actions, facilitating planning and generalization. However, typical world models directly operate on the environment variables (e.g. pixels, physical attributes), which can make their training slow and cumbersome; instead, it may be advantageous to rely on high-level latent dimensions that capture relevant multimodal variables. Global Workspace (GW) Theory offers a cognitive framework for multimodal integration and information broadcasting in the brain, and recent studies have begun to introduce efficient deep learning implementations of GW. Here, we evaluate the capabilities of an RL system combining GW with a world model. We compare our GW-Dreamer with various versions of the standard PPO and the original Dreamer algorithms. We show that performing the dreaming process (i.e., mental simulation) inside the GW latent space allows for training with fewer environment steps. As an additional emergent property, the resulting model (but not its comparison baselines) displays strong robustness to the absence of one of its observation modalities (images or simulation attributes). We conclude that the combination of GW with World Models holds great potential for improving decision-making in RL agents.
翻译:人类利用丰富的内部世界模型来推理未来、想象反事实情境并灵活适应新环境。在强化学习(RL)中,世界模型旨在捕捉环境如何随智能体行为而演化,从而促进规划与泛化。然而,典型的世界模型直接对环境变量(如像素、物理属性)进行操作,这可能导致训练过程缓慢且繁琐;相比之下,依赖捕捉相关多模态变量的高层潜在维度可能更具优势。全局工作空间(GW)理论为大脑中的多模态整合与信息广播提供了认知框架,近期研究已开始引入高效的深度学习GW实现。本文评估了一种将GW与世界模型结合的RL系统性能。我们将GW-Dreamer与标准PPO算法的多个版本及原始Dreamer算法进行比较。研究表明,在GW潜在空间内执行梦境过程(即心理模拟)能够以更少的环境步数完成训练。作为额外的涌现特性,所得模型(而非其对比基线)在缺失一种观测模态(图像或模拟属性)时表现出极强的鲁棒性。我们得出结论:GW与世界模型的结合对于提升RL智能体的决策能力具有巨大潜力。