With the advent of high-throughput sequencing (HTS) in molecular biology and medicine, the need for scalable statistical solutions for modeling complex biological systems has become of critical importance. The increasing number of platforms and possible experimental scenarios raised the problem of integrating large amounts of new heterogeneous data and current knowledge, to test novel hypotheses and improve our comprehension of physiological processes and diseases. Although network theory provided a framework to represent biological systems and study their hidden properties, different algorithms still offer low reproducibility and robustness, dependence on user-defined setup, and poor interpretability. Here we discuss the R package SEMgraph, combining network analysis and causal inference within the framework of structural equation modeling (SEM). It provides a fully automated toolkit, managing complex biological systems as multivariate networks, ensuring robustness and reproducibility through data-driven evaluation of model architecture and perturbation, that is readily interpretable in terms of causal effects among system components. In addition, SEMgraph offers several functions for perturbed path finding, model reduction, and parallelization options for the analysis of large interaction networks.
翻译:随着分子生物学和医学的高通量测序(HTS)的出现,为模拟复杂的生物系统而需要可扩展的统计解决办法就变得至关重要。越来越多的平台和可能的实验设想提出了将大量新的多样化数据和现有知识结合起来的问题,以测试新的假设,并改进我们对生理过程和疾病的了解。虽然网络理论提供了一个框架来代表生物系统并研究其隐藏的特性,但不同的算法仍然提供低可复制性和稳健性、依赖用户定义的设置和不易解释性。我们在这里讨论R包SEMgraph,在结构等式模型(SEM)框架内结合网络分析和因果关系推断。它提供了一个完全自动化的工具包,将复杂的生物系统作为多变网络加以管理,通过数据驱动的模型结构和扰动性评估确保稳健性和再生,这在系统各组成部分之间的因果关系方面很容易解释。此外,SEMraph还提供若干功能,用于在大型互动网络的分析中进行透视路径、模型减少和平行化选择。