Understanding which parts of a dynamical system cause each other is extremely relevant in fundamental and applied sciences. However, inferring causal links from observational data, namely without direct manipulations of the system, is still computationally challenging, especially if the data are high-dimensional. In this study we introduce a framework for constructing causal graphs from high-dimensional time series, whose computational cost scales linearly with the number of variables. The approach is based on the automatic identification of dynamical communities, groups of variables which mutually influence each other and can therefore be described as a single node in a causal graph. These communities are efficiently identified by optimizing the Information Imbalance, a statistical quantity that assigns a weight to each putative causal variable based on its information content relative to a target variable. The communities are then ordered starting from the fully autonomous ones, whose evolution is independent from all the others, to those that are progressively dependent on other communities, building in this manner a community causal graph. We demonstrate the computational efficiency and the accuracy of our approach on time-discrete and time-continuous dynamical systems including up to 80 variables.
翻译:理解动力系统中各部分之间的因果关联在基础科学与应用科学中均具有极高重要性。然而,从观测数据(即不直接干预系统的情况下)推断因果联系仍存在计算挑战,尤其是在数据高维的情况下。本研究提出一种从高维时间序列构建因果图的框架,其计算成本随变量数量呈线性增长。该方法基于对动态社区的自动识别——这些社区是由相互影响的变量群组成的集合,可在因果图中被描述为单一节点。这些社区通过优化"信息不平衡"这一统计量得以高效识别,该统计量根据每个假定因果变量相对于目标变量的信息含量为其分配权重。随后,社区按照从完全自治型(其演化独立于所有其他社区)到逐步依赖其他社区的层级进行排序,从而构建出社区因果图。我们在包含多达80个变量的时间离散与时间连续动力系统中验证了该方法在计算效率与准确性方面的表现。