具有教育数据应用的高五分立伊索型模型的贝叶斯模式选择 (Bayesian Model Selection for High-Dimensional Ising Models, With Applications to Educational Data)

Doubly-intractable posterior distributions arise in many applications of statistics concerned with discrete and dependent data, including physics, spatial statistics, machine learning, the social sciences, and other fields. A specific example is psychometrics, which has adapted high-dimensional Ising models from machine learning, with a view to studying the interactions among binary item responses in educational assessments. To estimate high-dimensional Ising models from educational assessment data, $\ell_1$-penalized nodewise logistic regressions have been used. Theoretical results in high-dimensional statistics show that $\ell_1$-penalized nodewise logistic regressions can recover the true interaction structure with high probability, provided that certain assumptions are satisfied. Those assumptions are hard to verify in practice and may be violated, and quantifying the uncertainty about the estimated interaction structure and parameter estimators is challenging. We propose a Bayesian approach that helps quantify the uncertainty about the interaction structure and parameters without requiring strong assumptions, and can be applied to Ising models with thousands of parameters. We demonstrate the advantages of the proposed Bayesian approach compared with $\ell_1$-penalized nodewise logistic regressions by simulation studies and applications to small and large educational data sets with up to 2,485 parameters. Among other things, the simulation studies suggest that the Bayesian approach is more robust against model misspecification due to omitted covariates than $\ell_1$-penalized nodewise logistic regressions.

翻译：在许多与离散和依赖数据有关的统计应用中,包括物理学、空间统计、机器学习、机器学习、社会科学和其他领域,都会出现多可吸引的外表分布。一个具体的例子就是心理计量,这些计量根据机器学习改用高维Ising模型,目的是研究教育评估中的二进项答复之间的相互作用。根据教育评估数据估算高维Ising模型, 使用了$_ ell_ 1美元, 依赖的节点后勤回归。高维统计的理论结果显示,如果某些假设得到满足,那么高概率就能恢复真正的互动结构。这些假设难以在实践中核实,而且可能被违反,并且量化估计互动结构和参数估算值的不确定性。我们建议采用巴伊西亚办法,帮助量化互动结构和参数的不确定性,而不需要强有力的假设,并且可以适用于具有数千项参数的Ising 模型。我们展示了拟议巴伊西亚办法的优势,比美元=1美元和1美元之间的节能性逻辑回归性回归率要高,我们展示了比美元和1美元更精确的校正化方法的优势,通过大规模的模拟研究,通过高正化的逻辑研究,将基础研究到大的逻辑分析研究,以大的逻辑分析研究到高基系的比重的逻辑分析研究, 。