With the increasing affordability and availability of patient data, hospitals tend to outsource their data to cloud service providers (CSPs) for the purpose of storage and analytics. However, the concern of data privacy significantly limits the data owners' choice. In this work, we propose the first solution, to the best of our knowledge, that allows a CSP to perform efficient identification of target patients (e.g., pre-processing for a genome-wide association study - GWAS) over multi-tenant encrypted phenotype data (owned by multiple hospitals or data owners). We first propose an encryption mechanism for phenotype data, where each data owner is allowed to encrypt its data with a unique secret key. Moreover, the ciphertext supports privacy-preserving search and, consequently, enables the selection of the target group of patients (e.g., case and control groups). In addition, we provide a per-query based authorization mechanism for a client to access and operate on the data stored at the CSP. Based on the identified patients, the proposed scheme can either (i) directly conduct GWAS (i.e., computation of statistics about genomic variants) at the CSP or (ii) provide the identified groups to the client to directly query the corresponding data owners and conduct GWAS using existing distributed solutions. We implement the proposed scheme and run experiments over a real-life genomic dataset to show its effectiveness. The result shows that the proposed solution is capable to efficiently identify the case/control groups in a privacy-preserving way.
翻译:随着患者数据的可负担性和可获得性日益提高,医院往往为了储存和分析目的将其数据外包给云服务供应商(CSPs),然而,数据隐私的担心极大地限制了数据拥有者的选择。在这项工作中,我们建议了第一个解决方案,以我们的知识为限,使CSP能够有效地识别目标患者(例如,基因组协会研究的预处理-GWAS),而不是(由多家医院或数据所有者拥有的)多使用加密加密的加密苯型数据。我们首先提议了一个计算机型数据的加密机制,允许每个数据所有者用独特的秘密钥匙加密数据。此外,密码文本支持隐私保存搜索,从而使得能够选择目标患者群体(例如,案例和控制群体),此外,我们为客户提供了一个基于授权机制,使其能够访问和操作(多家医院或数据所有者拥有者拥有的)存储数据。基于已查明的患者,拟议办法可以(一)直接进行GWASS(i) 和(c) 直接进行GVS) 数据库的配置,以便我们用现有版本的客户(i) 向现有数据用户提供现有版本的计算。