This paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function $\nabla \log p(\mathbf{X})$, we extend the work of Rolland et al. (2022) that only recovers the topological order from the score and requires an expensive pruning step removing spurious edges among those admitted by the ordering. Our analysis leads to DAS (acronym for Discovery At Scale), a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar.
翻译:本文展示了如何利用非线性加性高斯噪声模型中$log-likelihood$的二阶导数来发现完整的因果图。利用可伸缩的机器学习方法来近似得分函数$\nabla \log p(\mathbf{X})$,我们扩展了Rolland等人(2022年)的工作,他们仅从得分中恢复了拓扑顺序,并需要一个昂贵的修剪步骤来删除由顺序允许的虚假边。我们的分析引出了DAS(Discovery At Scale),一种实用算法,将修剪的复杂性降低了与图的大小成比例的因子。实际上,DAS在达到当前最先进水平的精度时速度要快一个数量级以上。总体来说,我们的方法实现了原则性和可伸缩性的因果关系发现,显著降低了计算门槛。