For sparse high-dimensional regression problems, Cox and Battey [1, 9] emphasised the need for confidence sets of models: an enumeration of those small sets of variables that fit the data equivalently well in a suitable statistical sense. This is to be contrasted with the single model returned by penalised regression procedures, effective for prediction but potentially misleading for subject-matter understanding. The proposed construction of such sets relied on preliminary reduction of the full set of variables, and while various possibilities could be considered for this, [9] proposed a succession of regression fits based on incomplete block designs. The purpose of the present paper is to provide insight on both aspects of that work. For an unspecified reduction strategy, we begin by characterising models that are likely to be retained in the model confidence set, emphasising geometric aspects. We then evaluate possible reduction schemes based on penalised regression or marginal screening, before theoretically elucidating the reduction of [9]. We identify features of the covariate matrix that may reduce its efficacy, and indicate improvements to the original proposal. An advantage of the approach is its ability to reveal its own stability or fragility for the data at hand.
翻译:对于稀少的高维回归问题,Cox和Battey[1, 9]强调需要有一套信任模式:列举在适当统计意义上与数据相当的、与数据相当的、在适当统计意义上与数据相当的一小套变量;这与通过惩罚性回归程序返回的单一模型形成对比,该模型对预测有效,但有可能误导对主题事项的理解;这些数据集的拟议构建依赖于对全套变量的初步缩减,虽然可以考虑各种可能性,但[9] 提议根据不完整的块块设计进行一系列回归。本文件的目的是就这项工作的两个方面提供深入的见解。关于未具体说明的削减战略,我们首先说明有可能保留在模型中的模式,强调几何方面。然后我们根据惩罚性回归或边际筛选评估可能的削减计划,然后从理论上说明减少[9]。我们找出了可降低其效率的变异矩阵的特征,并表明原始提案的改进之处。这一方法的优点是它能够显示其自身的稳定性或对手数据的脆弱性。</s>