In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.
翻译:在许多顺序决策任务中,代理商无法模拟由众多相关和不相关信息组成的全面复杂的世界。例如,一个在城市街道上行走、试图模拟世界各个方面的人很快会被众多商店、汽车和人所淹没,每个店铺、汽车和进出视线,每个店铺都遵循自己的复杂和不可分割的动态。该代理商的感官信息的火焰有可能变成一个既必要又足以使代理商在世界上成功采取行动的最低潜伏状态吗?我们具体地提出这一问题,并提议一个有理论保证并实际证明能够发现控制源性潜质最低状态,其中包含控制代理商所需的全部信息,同时完全抛弃所有不相关信息。这一算法包括一个多步的反向模式(从遥远的观测中预示行动),并带有信息瓶颈。一个C-国家可以使本地化、探索和导航而无需奖赏或演示。我们展示了控制源性国家发现有源性的国家探索法,并实际展示了控制源性潜在状态,其中包括控制剂的所有必要信息,同时完全抛弃所有不相干的信息。这一算法由多步态模式(从遥远的观察器)组成一个信息瓶装。 AC-C-C- 国家可以发现一个控制- 和潜动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力动力空间在三个空间空间空间空间空间空间,在三个空间空间空间空间空间空间空间空间空间空间空间空间空间空间中探索。