We explore the limitations of and best practices for using black-box variational inference to estimate posterior summaries of the model parameters. By taking an importance sampling perspective, we are able to explain and empirically demonstrate: 1) why the intuitions about the behavior of approximate families and divergences for low-dimensional posteriors fail for higher-dimensional posteriors, 2) how we can diagnose the pre-asymptotic reliability of variational inference in practice by examining the behavior of the density ratios (i.e., importance weights), 3) why the choice of variational objective is not as relevant for higher-dimensional posteriors, and 4) why, although flexible variational families can provide some benefits in higher dimensions, they also introduce additional optimization challenges. Based on these findings, for high-dimensional posteriors we recommend using the exclusive KL divergence that is most stable and easiest to optimize, and then focusing on improving the variational family or using model parameter transformations to make the posterior more similar to the approximating family. Our results also show that in low to moderate dimensions, heavy-tailed variational families and mass-covering divergences can increase the chances that the approximation can be improved by importance sampling.
翻译:我们探索了使用黑盒变异推断法来估计模型参数的后部摘要的局限性和最佳做法。通过一个重要的抽样角度,我们能够解释和从经验上表明:(1) 为什么关于近似家庭行为的直觉和低维后部子体失灵的低维后部的差分对于高维后部的差数,(2) 我们如何通过检查密度比率(即,重量)的行为,来分析实践中变异推断法的防患前可靠性,(3) 为什么选择变异目标对于较高维度的后部没有相关性,(4) 为什么尽管灵活的变异家庭可以在更高层面带来一些好处,但它们也带来了额外的优化挑战。 基于这些发现,对于高维的后部,我们建议使用最稳定、最容易优化的独家可视的KL差异,然后侧重于改进变异性家庭,或者利用模型参数变异性变异性使后部更接近相家庭。我们的结果还表明,低至中度的多维度、重尾部变异系和大规模变异性变近可能提高变近度的可能性。