Bayesian variable selection (BVS) depends critically on the specification of a prior distribution over the model space, particularly for controlling sparsity and multiplicity. This paper examines the practical consequences of different model space priors for BVS in logistic regression, with an emphasis on streaming data settings. We review some popular and well-known Beta--Binomial priors alongside the recently proposed matryoshka doll (MD) prior. We introduce a simple approximation to the MD prior that yields independent inclusion indicators and is convenient for scalable inference. Using BIC-based approximations to marginal likelihoods, we compare the effect of different model space priors on posterior inclusion probabilities and coefficient estimation at intermediate and final stages of the data stream via simulation studies. Overall, the results indicate that no single model space prior uniformly dominates across scenarios, and that the recently proposed MD prior provides a useful additional option that occupies an intermediate position between commonly used Beta--Binomial priors with differing degrees of sparsity.
翻译:贝叶斯变量选择(BVS)的关键在于模型空间先验分布的设定,特别是对于控制稀疏性与多重性至关重要。本文研究了在逻辑回归中,尤其是在流式数据场景下,不同模型空间先验对BVS的实际影响。我们回顾了一些流行且广为人知的Beta–Binomial先验,以及最近提出的套娃(MD)先验。我们引入了一种对MD先验的简单近似,该近似能产生独立的纳入指示变量,便于进行可扩展的推断。通过基于BIC的边缘似然近似,我们利用模拟研究比较了不同模型空间先验在数据流的中期和最终阶段对后验纳入概率及系数估计的影响。总体而言,结果表明,没有单一模型空间先验在所有场景中均占优;最近提出的MD先验提供了一个有用的额外选项,其特性介于具有不同稀疏度水平的常用Beta–Binomial先验之间。