在带有顶点共变量的随机区块模型图中用于社区探测的光谱算法 (On spectral algorithms for community detection in stochastic blockmodel graphs with vertex covariates)

In network inference applications, it is often desirable to detect community structure, namely to cluster vertices into groups, or blocks, according to some measure of similarity. Beyond mere adjacency matrices, many real networks also involve vertex covariates that carry key information about underlying block structure in graphs. To assess the effects of such covariates on block recovery, we present a comparative analysis of two model-based spectral algorithms for clustering vertices in stochastic blockmodel graphs with vertex covariates. The first algorithm uses only the adjacency matrix, and directly estimates the block assignments. The second algorithm incorporates both the adjacency matrix and the vertex covariates into the estimation of block assignments, and moreover quantifies the explicit impact of the vertex covariates on the resulting estimate of the block assignments. We employ Chernoff information to analytically compare the algorithms' performance and derive the information-theoretic Chernoff ratio for certain models of interest. Analytic results and simulations suggest that the second algorithm is often preferred: we can often better estimate the induced block assignments by first estimating the effect of vertex covariates. In addition, real data examples also indicate that the second algorithm has the advantages of revealing underlying block structure and taking observed vertex heterogeneity into account in real applications. Our findings emphasize the importance of distinguishing between observed and unobserved factors that can affect block structure in graphs.

翻译：在网络推断应用中,通常有必要根据某些相似度的量度来检测群落结构,即将脊椎分组成群或区块。除了相邻矩阵外,许多真实的网络还包含带有图形中块状结构关键信息的脊椎共变体。为了评估这种共变对区块恢复的影响,我们用两种基于模型的光谱算法进行比较分析两种基于模型的光谱算法,用于在具有顶点共差变量的区块模型图中组合脊椎。第一种算法只使用相邻矩阵,直接估计区块任务。第二种算法除了包含相邻矩阵和顶点共变体对估计区块任务结构结构结构的关键信息外,还包含含有顶点共变体变量的共变体变量。我们利用Cernoff信息来分析算算算算法的性能,并得出某些兴趣模型的信息-理论切诺系数比率。分析结果和模拟表明,第二套算法往往倾向于采用不相对区块任务进行对比的矩阵,我们往往可以更好地估计区段结构中观察到的正值,从而推算得出了正值的正值。