Controllable data generation aims to synthesize data by specifying values for target concepts. Achieving this reliably requires modeling the underlying generative factors and their relationships. In real-world scenarios, these factors exhibit both causal and correlational dependencies, yet most existing methods model only part of this structure. We propose the Causal-Correlation Variational Autoencoder (C2VAE), a unified framework that jointly captures causal and correlational relationships among latent factors. C2VAE organizes the latent space into a structured graph, identifying a set of root causes that govern the generative processes. By optimizing only the root factors relevant to target concepts, the model enables efficient and faithful control. Experiments on synthetic and real-world datasets demonstrate that C2VAE improves generation quality, disentanglement, and intervention fidelity over existing baselines.
翻译:可控数据生成旨在通过指定目标概念的值来合成数据。可靠地实现这一目标需要对潜在的生成因子及其关系进行建模。在现实场景中,这些因子同时表现出因果性和相关性依赖,然而现有方法大多仅建模了该结构的一部分。我们提出了因果-相关变分自编码器(C2VAE),这是一个统一框架,能够联合捕捉潜在因子间的因果与相关关系。C2VAE将潜在空间组织为结构化图,识别出一组支配生成过程的根本原因。通过仅优化与目标概念相关的根本因子,该模型实现了高效且忠实的数据控制。在合成数据集和真实数据集上的实验表明,C2VAE在生成质量、解耦性和干预保真度方面均优于现有基线方法。