Statistical models are central to machine learning with broad applicability across a range of downstream tasks. The models are controlled by free parameters that are typically estimated from data by maximum-likelihood estimation or approximations thereof. However, when faced with real-world datasets many of the models run into a critical issue: they are formulated in terms of fully-observed data, whereas in practice the datasets are plagued with missing data. The theory of statistical model estimation from incomplete data is conceptually similar to the estimation of latent-variable models, where powerful tools such as variational inference (VI) exist. However, in contrast to standard latent-variable models, parameter estimation with incomplete data often requires estimating exponentially-many conditional distributions of the missing variables, hence making standard VI methods intractable. We address this gap by introducing variational Gibbs inference (VGI), a new general-purpose method to estimate the parameters of statistical models from incomplete data. We validate VGI on a set of synthetic and real-world estimation tasks, estimating important machine learning models such as VAEs and normalising flows from incomplete data. The proposed method, whilst general-purpose, achieves competitive or better performance than existing model-specific estimation methods.
翻译:统计模型对于在一系列下游任务中广泛应用的机器学习具有核心意义。模型由自由参数控制,这些参数通常根据数据以最大可能性估计或近似值来估计。然而,当面对现实世界数据集时,许多模型都面临一个关键问题:它们是根据完全观察的数据制定的,而实际上,数据集却被缺失的数据所困扰。从不完整数据中统计模型估计理论在概念上类似于对潜伏可变模型的估计,在这种模型中存在着诸如变式推断(VI)等强有力的工具。然而,与标准的潜伏可变模型相比,与不完整数据相比,参数估计往往需要估计缺失变量的指数-多条件分布,从而使标准六方法难以调和。我们通过采用变式推论(VGI)来弥补这一差距,这是一种从不完整数据中估算统计模型参数的新的通用方法。我们根据一套合成和现实世界估计任务验证了VGII,估计重要的机器学习模型,如VAE和从不完整数据正常流流中估算。拟议的方法虽然是通用的,但具体估计方法比现有方法更具有竞争力或更好。</s>