Bayesian数据合成和混合流行病学数据公用事业-风险交易 (Bayesian Data Synthesis and the Utility-Risk Trade-Off for Mixed Epidemiological Data)

Much of the micro data used for epidemiological studies contain sensitive measurements on real individuals. As a result, such micro data cannot be published out of privacy concerns, rendering any published statistical analyses on them nearly impossible to reproduce. To promote the dissemination of key datasets for analysis without jeopardizing the privacy of individuals, we introduce a cohesive Bayesian framework for the generation of fully synthetic, high dimensional micro datasets of mixed categorical, binary, count, and continuous variables. This process centers around a joint Bayesian model that is simultaneously compatible with all of these data types, enabling the creation of mixed synthetic datasets through posterior predictive sampling. Furthermore, a focal point of epidemiological data analysis is the study of conditional relationships between various exposures and key outcome variables through regression analysis. We design a modified data synthesis strategy to target and preserve these conditional relationships, including both nonlinearities and interactions. The proposed techniques are deployed to create a synthetic version of a confidential dataset containing dozens of health, cognitive, and social measurements on nearly 20,000 North Carolina children.

翻译：流行病学研究所用的大部分微观数据都包含对真实个人的敏感测量数据,因此,由于隐私考虑,这类微观数据无法公布,因此几乎不可能再复制。为了在不损害个人隐私的情况下促进关键数据集的分析传播,我们采用了一个统一的巴伊西亚框架,以生成完整合成的高维微观数据集,包括混合的绝对数据、二元数据、计数数据以及连续变量。这一过程围绕一个与所有这些数据类型同时兼容的巴伊西亚联合模型进行,从而能够通过事后预测抽样建立混合合成数据集。此外,流行病学数据分析的一个中心是通过回归分析研究各种暴露与关键结果变量之间的有条件关系。我们设计了一个经过修改的数据综合战略,以瞄准并维护这些有条件的关系,包括非线性和互动性。拟议技术用于建立一个包含近20,000名北卡罗来纳州儿童健康、认知和社会测量结果的合成数据集的合成版本。

相关内容

MICRO

关注 1

MICRO：IEEE/ACM International Symposium on Microarchitecture Explanation：IEEE/ACM微体系结构国际研讨会。 Publisher：IEEE/ACM。 SIT:https://dblp.uni-trier.de/db/conf/micro/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日