This technical note considers the sampling of outcomes that provide the greatest amount of information about the structure of underlying world models. This generalisation furnishes a principled approach to structure learning under a plausible set of generative models or hypotheses. In active inference, policies - i.e., combinations of actions - are selected based on their expected free energy, which comprises expected information gain and value. Information gain corresponds to the KL divergence between predictive posteriors with, and without, the consequences of action. Posteriors over models can be evaluated quickly and efficiently using Bayesian Model Reduction, based upon accumulated posterior beliefs about model parameters. The ensuing information gain can then be used to select actions that disambiguate among alternative models, in the spirit of optimal experimental design. We illustrate this kind of active selection or reasoning using partially observed discrete models; namely, a 'three-ball' paradigm used previously to describe artificial insight and 'aha moments' via (synthetic) introspection or sleep. We focus on the sample efficiency afforded by seeking outcomes that resolve the greatest uncertainty about the world model, under which outcomes are generated.
翻译:本技术报告探讨了如何对能够提供关于底层世界模型结构最大信息量的结果进行采样。这一概括为在一组合理的生成模型或假设下进行结构学习提供了原则性方法。在主动推理中,策略——即动作的组合——是根据其期望自由能进行选择的,该自由能包含期望信息增益与价值。信息增益对应于考虑与不考虑动作后果时预测后验分布之间的KL散度。基于累积的模型参数后验信念,可利用贝叶斯模型约简快速高效地评估模型后验。随后,遵循最优实验设计的精神,所得信息增益可用于选择能够消除替代模型间歧义的动作。我们通过部分可观测离散模型(即先前用于通过(合成)内省或睡眠描述人工顿悟与"顿悟时刻"的'三球'范式)来阐释此类主动选择或推理。我们重点关注通过寻求能够解决世界模型最大不确定性的结果所实现的采样效率,而观测结果正是在该世界模型下生成的。