项目名称: 新一代从头测序算法设计与应用研究
项目编号: No.31470805
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 心理学
项目作者: 迟浩
作者单位: 中国科学院计算技术研究所
项目金额: 80万元
中文摘要: 从头测序方法是生物质谱数据分析最重要的方法之一,它不依赖于数据库信息推断出肽段序列,与数据库搜索方法相比具有不可替代的优势。不过,相对而言,从头测序方法更加困难,测序精度低以及缺乏有效的评价方法,一直着阻碍从头测序投入实际的生物应用。本课题拟基于信息检索和机器学习领域技术,深入研究从头测序算法各环节中存在的缺陷,提升算法的准确度,发展稳定可靠的结果评价算法,并进一步探索可靠的序列拼接技术,实现由从头测序获得长肽段甚至完整蛋白信息的过程。同时,本课题计划进一步探讨从头测序与数据库搜索结果融合的策略,为质谱数据的深度解析提供更全面、更有效的方法,使肽段从头测序技术能够切实应用于高精度生物质谱数据的深入分析,并在生物学研究中发挥更多作用。总之,本课题从信息检索与机器学习的应用着手,全面改进从头测序算法的各个环节,期望能够显著提高从头测序结果的数量和质量。这方面研究目前在国内外鲜有报告。
中文关键词: 生物质谱;从头测序;序列拼接;信息检索;机器学习
英文摘要: De novo Peptide sequencing is one of the most important methods for analyzing biological mass spectrometry data. Peptides are derived independent of any proteome database information, which makes de novo sequencing have irreplaceable advantages compared with database searching. However, algorithm design of de novo sequencing is more complicated; furthermore, low precision and the lack of effective evaluation approaches hinder the biological application of de novo sequencing all the time. Therefore, we are plan to investigate the defects in the current de novo sequencing algorithms, and then improve the performance based on the well-studied information retrieval and machine learning techniques. Thereafter, novel algorithms for peptide assembly will be developed in this study, which makes longer peptides or even the intact proteins be sequenced successfully. In addition, the integration stategy of the result of de novo sequencing as well as database search will be studied. The most impotant innovation in this study is that information retrieval and machine learning techniques will be comprehensively applied in every step of de novo sequencing. As a result, the quantity and quality of the sequenced peptides are expected to be remarkably increased. Such relative studies have rarely been reported in the research field of proteomics.
英文关键词: biological mass spectrometry;de novo sequencing;peptide sequence assembly;information retrieval;machine learning