We provide a complete taxonomic characterization of robust hierarchical clustering methods for directed networks following an axiomatic approach. We begin by introducing three practical properties associated with the notion of robustness in hierarchical clustering: linear scale preservation, stability, and excisiveness. Linear scale preservation enforces imperviousness to change in units of measure whereas stability ensures that a bounded perturbation in the input network entails a bounded perturbation in the clustering output. Excisiveness refers to the local consistency of the clustering outcome. Algorithmically, excisiveness implies that we can reduce computational complexity by only clustering a subset of our data while theoretically guaranteeing that the same hierarchical outcome would be observed when clustering the whole dataset. In parallel to these three properties, we introduce the concept of representability, a generative model for describing clustering methods through the specification of their action on a collection of networks. Our main result is to leverage this generative model to give a precise characterization of all robust -- i.e., excisive, linear scale preserving, and stable -- hierarchical clustering methods for directed networks. We also address the implementation of our methods and describe an application to real data.
翻译:我们对定向网络的稳健等级集群方法进行完全的分类学定性,采用不言而喻的方法。我们首先采用与等级集群稳健性概念有关的三种实际属性:线性规模保存、稳定性和分辨性。线性规模保存强制不易改变计量单位,而稳定性则确保输入网络的捆绑性扰动在组合输出中产生受约束的扰动。精确性是指集群结果的当地一致性。分辨性、分辨性意味着我们只能通过将我们的数据组合成一组来降低计算复杂性,同时在理论上保证在整个数据集组合时也观察到同样的等级结果。在这三个属性的同时,我们引入了可显示性概念,即一个通过对网络集成行动进行规范来描述集群方法的典型模型。我们的主要结果是利用这一基因化模型来精确地描述所有稳健性 -- -- 即精细度、线性规模保存和稳定的等级组合方法。我们还处理我们方法的实施,并描述对定向网络实际数据的应用。