高通量基因数据分析中的 Bayes 统计方法

项目名称： 高通量基因数据分析中的 Bayes 统计方法

项目编号： No.10801123

项目类型： 青年科学基金项目

立项/批准年度： 2009

项目学科： 金属学与金属工艺

项目作者： 张伟平

作者单位： 中国科学技术大学

项目金额： 16万元

中文摘要： 本项目的目标是研究高通量基因数据分析中的 Bayes 统计推断方法。高通量基因数据，如微阵列基因表达数据和单核苷酸多肽标记(SNP)数据等，因其变量维数远远大于数据个数、误差结构复杂等特点为传统统计的理论和方法带来了新的挑战和研究方向。Bayes 统计中先验信息的使用可以增加和综合信息，以及对数据进行平滑和降维，加之现在MCMC 计算方法已在很大范围内解决了Bayes 统计的计算困难问题，使得Bayes 统计尤其适合于对高通量基因数据进行统计建模和分析。本项目将首先从高通量基因数据分析中已广泛应用的线性模型和广义线性模型入手，发展稳健先验下的Bayes 和经验Bayes 统计推断方法并研究其性质，进而研究其在微阵列基因表达数据等高通量基因数据分析中的应用；对SNP 数据单体型概率的估计问题，将侧重研究多层Bayes 方法的应用并发展快速的计算方法。我们计划将这些方法应用于实际数据分析。

中文关键词： 高通量基因数据；线性模型；广义线性模型；Bayes 统计

英文摘要： The main purpose of this project is to study the Bayes statistical inference methods in hight-throughput genetic data analysis. High-throughput genetic data, such as microarray gene expression data and single nucleotide polymorphism (SNP) data, take a great challenge and new research direction to the classical statistics because of the curse of dimensionality and complex error structure. In Bayes statistics, a priori can increase and integrate information, and makes the data smooth and dimension reduction; On the other hand, carefully crafted Markov chain Monte Carlo (MCMC) algorithms executed on today's fast computers are able to solve a phenomenal range of computing problems in Bayes statistics inference, all these make Bayes statistics particular attractive in modelling and analyzing hight-throughput genetic data.In this project, we first study the widely used linear model and generalized linear model in high-throughput genetic data analysis, develop the Bayes and empirical Bayes approaches under robust prior and obtain their properties, and then study their application in gene microarray data analysis. For the estimation of SNP haplotypes, we will focus on studying the application of hierarchical Bayes method and develop efficient algorithms. we will apply these developed new mthods to real data analysis.

英文关键词： High-throughput genetic data; linear model; generalized linear model; Bayes statistics

成为VIP会员查看完整内容