Graphical models or networks describe the statistical dependence among multiple variables and are widely used in biology (e.g., gene regulatory networks). Under appropriate assumptions, directed edges may represent causal relationships. A key feature of a biological network is sparsity, defined by how likely an edge is present, of which we often have some knowledge. However, most existing Bayesian methods use priors for the entire graph, making it difficult to specify the level of sparsity. The few methods that use priors on edges estimate the two directions independently; the sum of the two probabilities can exceed 1. Here, we present baycn (BAYesian Causal Network), a novel approximate Bayesian method that represents a graph in terms of three states of edges: the two directions and edge absence, and specifies priors on these edge states. We design a pseudo Bayesian sampling algorithm for efficient inference. We apply baycn to two genomic problems: i) distinguishing direct and indirect target genes of genetic variants, using these variants as instrumental variables, and ii) inferring combinatorial binding of highly-correlated transcription factors in Drosophila. In both cases and in extensive simulations, our method demonstrates much improved accuracy over existing methods for the whole graph and for individual edges.
翻译:暂无翻译