基于交叉验证特征值的图维度估计 (Estimating Graph Dimension with Cross-validated Eigenvalues)

In applied multivariate statistics, estimating the number of latent dimensions or the number of clusters, $k$, is a fundamental and recurring problem. We study a sequence of statistics called "cross-validated eigenvalues." Under a large class of random graph models, including both Poisson and Bernoulli edges, without parametric assumptions, we provide a $p$-value for each cross-validated eigenvalue. It tests the null hypothesis that the sample eigenvector is orthogonal to (i.e., uncorrelated with) the true latent dimensions. This approach naturally adapts to problems where some dimensions are not statistically detectable. In scenarios where all $k$ dimensions can be estimated, we show that our procedure consistently estimates $k$. In simulations and data example, the proposed estimator compares favorably to alternative approaches in both computational and statistical performance.

翻译：在应用多元统计学中，估计潜在维度数量或聚类数量$k$是一个基础且反复出现的问题。我们研究了一系列称为“交叉验证特征值”的统计量。在包含泊松边和伯努利边的广泛随机图模型类别下，无需参数假设，我们为每个交叉验证特征值提供了一个$p$值。该检验用于验证样本特征向量是否与真实潜在维度正交（即不相关）。此方法自然适用于某些维度在统计上不可检测的问题。在所有$k$个维度均可估计的场景中，我们证明该程序能一致地估计$k$。在模拟和实际数据示例中，所提出的估计器在计算性能和统计性能上均优于其他替代方法。

相关内容

交叉验证

关注 2

交叉验证，有时也称为旋转估计或样本外测试，是用于评估统计结果如何的各种类似模型验证技术中的任何一种分析将概括为一个独立的数据集。它主要用于设置，其目的是预测，和一个想要估计如何准确地一个预测模型在实践中执行。在预测问题中，通常会给模型一个已知数据的数据集，在该数据集上进行训练（训练数据集）以及未知数据（或首次看到的数据）的数据集（根据该数据集测试模型）（称为验证数据集或测试集）。交叉验证的目标是测试模型预测未用于估计数据的新数据的能力，以发现诸如过度拟合或选择偏倚之类的问题，并提供有关如何进行建模的见解。该模型将推广到一个独立的数据集（例如，未知数据集，例如来自实际问题的数据集）。一轮交叉验证涉及分割一个样品的数据到互补的子集，在一个子集执行所述分析（称为训练集），以及验证在另一子集中的分析（称为验证集合或测试集）。为了减少可变性，在大多数方法中，使用不同的分区执行多轮交叉验证，并将验证结果组合（例如取平均值）在各轮中，以估计模型的预测性能。总而言之，交叉验证结合了预测中适用性的度量（平均），以得出模型预测性能的更准确估计。

UnHiPPO：面向不确定性的状态空间模型初始化方法

专知会员服务

11+阅读 · 6月6日

【ICLR2022】GNN-LM基于全局信息的图神经网络语义理解模型

专知会员服务

21+阅读 · 2022年2月12日

NeurIPS 2021 | 寻找用于变分布泛化的隐式因果因子

专知会员服务

17+阅读 · 2021年12月7日

【ICML2021】随机傅立叶特征的量化算法

专知会员服务

25+阅读 · 2021年7月31日