基于结构和多数据源融合的全基因组蛋白质功能预测技术研究

项目名称： 基于结构和多数据源融合的全基因组蛋白质功能预测技术研究

项目编号： No.61309010

项目类型： 青年科学基金项目

立项/批准年度： 2014

项目学科： 自动化技术、计算机技术

项目作者： 邓磊

作者单位： 中南大学

项目金额： 27万元

中文摘要： 揭示生物体内成千上万种蛋白质的功能是后基因组时代科学研究极富挑战的领域之一，对于理解生命活动的内在机理、疾病治疗和新药开发都具有重要的意义。随着高通量测序技术的飞速发展，越来越多的基因组被测序，使用传统实验方法来识别蛋白质功能已远远不能满足当前的需要。本课题将在系统分析蛋白质三维结构信息和其他功能线索的基础上，结合生物学、数学、物理学和计算机等领域的方法和技术，探索在全基因组范围内大规模预测蛋白质功能的新技术和新算法。研究内容包括：（1）在研究高性能蛋白质结构比对算法的基础上，建立新型的非线性功能预测模型，并通过大量运用同源模型来提高预测的覆盖度；（2）构建多线索融合网络，运用复杂网络方法研究网络的拓扑结构，分析聚集性、模块化等网络性质，挖掘功能社区，提出新的功能预测方法；（3）研究大规模蛋白质功能集成预测方法；（4）开发蛋白质功能综合预测平台和数据库，为药物开发等应用提供技术和数据支持。

中文关键词： 蛋白质功能；机器学习；能量热点；结构比对；

英文摘要： Exploring the functions of thousands of proteins is one of the most challenging areas in post-genomic era, and is of great significance for understanding life activities, disease treatment and new drug development. With the rapid development of high-throughput sequencing technology, more and more genomes were sequenced. Using traditional experimental methods to identify protein functions has been far from being able to meet the current demand. This research, based on systematic analysis of protein 3D structures and other function clues, and combined with the methods and techniques in the field of Biology, Mathematics, Physics and Computer Science, is to explore new algorithms for large-scale genome-wide prediction of protein functions. The study includes: (1)Based on the study of high-performance protein structure alignment methods, establish new types of nonlinear function prediction models, and improve the prediction coverage through the extensive use of the homology models; (2)Build integrated multi-clues network, use complex networks to study the topology and frequent substructures of the networks, and analyse a variety of network properties, such as clustering coefficient and module partition, mine functionality communities, and finally propose new function prediction methods; (3)Study large-scale ensemble

英文关键词： protein function；machine learning；hot spots；structure alignment；

成为VIP会员查看完整内容