Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized methods for Bayesian inference and experimental design offer part of the solution, neither approach is optimal in the most general and challenging task, where new data needs to be collected for instant inference. To tackle this issue, we introduce the Amortized Active Learning and Inference Engine (ALINE), a unified framework for amortized Bayesian inference and active data acquisition. ALINE leverages a transformer architecture trained via reinforcement learning with a reward based on self-estimated information gain provided by its own integrated inference component. This allows it to strategically query informative data points while simultaneously refining its predictions. Moreover, ALINE can selectively direct its querying strategy towards specific subsets of model parameters or designated predictive tasks, optimizing for posterior estimation, data prediction, or a mixture thereof. Empirical results on regression-based active learning, classical Bayesian experimental design benchmarks, and a psychometric model with selectively targeted parameters demonstrate that ALINE delivers both instant and accurate inference along with efficient selection of informative points.
翻译:从自主科学发现到个性化医疗等众多关键应用,都需要系统能够战略性地获取最具信息量的数据,并基于这些数据即时执行推理。虽然用于贝叶斯推理和实验设计的摊销方法提供了部分解决方案,但在最普遍且最具挑战性的任务——即为即时推理收集新数据——中,这两种方法均非最优。为解决这一问题,我们提出了摊销式主动学习与推理引擎(ALINE),这是一个用于摊销贝叶斯推理与主动数据获取的统一框架。ALINE采用基于Transformer的架构,通过强化学习进行训练,其奖励信号来源于其内置推理组件自我估计的信息增益。这使得它能够战略性地查询信息丰富的数据点,同时不断优化其预测。此外,ALINE可以有选择地将其查询策略导向特定的模型参数子集或指定的预测任务,从而优化后验估计、数据预测或两者的混合目标。在基于回归的主动学习、经典贝叶斯实验设计基准测试,以及针对特定参数进行选择性优化的心理测量模型上的实证结果表明,ALINE能够同时实现即时且准确的推理,并高效地选择信息点。