生成性暴露映射模型下非对称关系的量化与交叉拟合推断 (Quantification and cross-fitting inference of asymmetric relations under generative exposure mapping models)

Learning directionality between variables is crucial yet challenging, especially for mechanistic relationships without a priori ordering assumptions. We propose a coefficient of asymmetry to quantify directional asymmetry using Shannon's entropy within a generative exposure mapping (GEM) framework. GEMs arise from experiments where a generative function $g$ maps exposure $X$ to outcome $Y$ through $Y = g(X)$, extended to noise-perturbed GEMs as $Y = g(X) + \epsilon$. Our approach considers a rich class of generative functions while providing statistical inference for uncertainty quantification - a gap in existing bivariate causal discovery techniques. We establish large-sample theoretical guarantees through data-splitting and cross-fitting techniques, implementing fast Fourier transformation-based density estimation to avoid parameter tuning. The methodology accommodates contamination in outcome measurements. Extensive simulations demonstrate superior performance compared to competing causal discovery methods. Applied to epigenetic data examining DNA methylation and blood pressure relationships, our method unveils novel pathways for cardiovascular disease genes \emph{FGF5} and \emph{HSD11B2}. This framework serves as a discovery tool for improving scientific research rigor, with GEM-induced asymmetry representing a low-dimensional imprint of underlying causality

翻译：学习变量间的方向性至关重要但极具挑战性，尤其对于缺乏先验排序假设的机制性关系。我们提出一种非对称系数，利用香农熵在生成性暴露映射（GEM）框架内量化方向性非对称。GEM源于实验，其中生成函数$g$通过$Y = g(X)$将暴露$X$映射至结果$Y$，并扩展至噪声干扰的GEM形式$Y = g(X) + \\epsilon$。该方法考虑了一类丰富的生成函数，同时为不确定性量化提供统计推断——这是现有双变量因果发现技术中的一个空白。我们通过数据分割与交叉拟合技术建立了大样本理论保证，并采用基于快速傅里叶变换的密度估计以避免参数调优。该方法可适应结果测量中的污染问题。大量仿真实验表明，相较于现有因果发现方法，本方法具有更优性能。在应用于研究DNA甲基化与血压关系的表观遗传学数据时，我们的方法揭示了心血管疾病基因\\emph{FGF5}和\\emph{HSD11B2}的新通路。该框架可作为提升科学研究严谨性的发现工具，其中GEM诱导的非对称性代表了底层因果性的低维印记。