Mapping biological mechanisms in cellular systems is a fundamental step in early-stage drug discovery that serves to generate hypotheses on what disease-relevant molecular targets may effectively be modulated by pharmacological interventions. With the advent of high-throughput methods for measuring single-cell gene expression under genetic perturbations, we now have effective means for generating evidence for causal gene-gene interactions at scale. However, inferring graphical networks of the size typically encountered in real-world gene-gene interaction networks is difficult in terms of both achieving and evaluating faithfulness to the true underlying causal graph. Moreover, standardised benchmarks for comparing methods for causal discovery in perturbational single-cell data do not yet exist. Here, we introduce CausalBench - a comprehensive benchmark suite for evaluating network inference methods on large-scale perturbational single-cell gene expression data. CausalBench introduces several biologically meaningful performance metrics and operates on two large, curated and openly available benchmark data sets for evaluating methods on the inference of gene regulatory networks from single-cell data generated under perturbations. With real-world datasets consisting of over \numprint{200000} training samples under interventions, CausalBench could potentially help facilitate advances in causal network inference by providing what is - to the best of our knowledge - the largest openly available test bed for causal discovery from real-world perturbation data to date.
翻译:蜂窝系统中的生物机制绘图是早期药物发现的一个根本步骤,它有助于产生关于哪些与疾病有关的分子目标可以通过药理干预加以有效调节的假设。随着在遗传扰动下计量单细胞基因表达方式的高通量方法的到来,我们现在有有效手段为大规模扰动单细胞基因表达方式数据提供证据。然而,从实现和评价对真实世界基因基因-基因互动网络通常遇到的大小的图形网络的忠诚性来看,很难推断实际世界基因-基因互动网络中通常遇到的大小的图形网络。此外,目前还没有标准化的基准,以比较在扰动单细胞数据中因果发现方法的比较方法。在这里,我们引入了Causal Bench-一个用于评价大规模扰动单细胞基因表达方式相互作用的网络推论方法的全面基准套件。Causal Bench采用若干具有生物意义的性能衡量标准,并用两个大型、成熟和公开可得的基准数据集来评价基因管理网络从透扰动性单一细胞数据生成的单一细胞数据的方法。在真实的单细胞单细胞单细胞数据数据数据数据中进行比较,这是在真实的2000年期下进行最佳的检验,这是在真实的检验网络上提供最佳数据,在真实的测试日期下提供最佳的对真实数据的进展。