病毒载体：紧凑且可扩展的基因组匹配自由的生物体组特征生成 (ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation) - 专知论文

会员服务 ·

0

病毒 · 特征生成 · 对准 · 序列 · 生物 ·

2023 年 4 月 7 日

ViralVectors: Compact and Scalable Alignment-free Virome Feature Generation

翻译：病毒载体：紧凑且可扩展的基因组匹配自由的生物体组特征生成

Sarwan Ali,Prakash Chourasia,Zahra Tayebi,Babatunde Bello,Murray Patterson

from arxiv, 24 pages, 5 figures, accepted to Springer Medical & Biological Engineering & Computing

The amount of sequencing data for SARS-CoV-2 is several orders of magnitude larger than any virus. This will continue to grow geometrically for SARS-CoV-2, and other viruses, as many countries heavily finance genomic surveillance efforts. Hence, we need methods for processing large amounts of sequence data to allow for effective yet timely decision-making. Such data will come from heterogeneous sources: aligned, unaligned, or even unassembled raw nucleotide or amino acid sequencing reads pertaining to the whole genome or regions (e.g., spike) of interest. In this work, we propose \emph{ViralVectors}, a compact feature vector generation from virome sequencing data that allows effective downstream analysis. Such generation is based on \emph{minimizers}, a type of lightweight "signature" of a sequence, used traditionally in assembly and read mapping -- to our knowledge, the first use minimizers in this way. We validate our approach on different types of sequencing data: (a) 2.5M SARS-CoV-2 spike sequences (to show scalability); (b) 3K Coronaviridae spike sequences (to show robustness to more genomic variability); and (c) 4K raw WGS reads sets taken from nasal-swab PCR tests (to show the ability to process unassembled reads). Our results show that ViralVectors outperforms current benchmarks in most classification and clustering tasks.

翻译：---- 随着新冠病毒基因测序数据量的指数级增长，从整个基因组或重要区域（如刺突）的已对准、未对准甚至未组装的原始核苷酸或氨基酸测序读数等异构来源收集大量测序数据来支持有效但及时的决策-making的方法对于提升整个新冠病毒和其他病毒的基因组监测非常重要。在本研究中，我们提出了一种称为“病毒载体”（ViralVectors）的紧凑特征向量方法来从生物体组测序数据中提取信息，以支持有效的后续分析。这种生成基于最小化（minimizers），这是一种轻量级的序列“签名”，在组装和读取映射中传统上进行使用，我们认为是第一个在这种方式中使用最小化方法。我们在不同类型的测序数据上验证了我们的方法：（a）250万个SARS-CoV-2刺突序列（以显示可扩展性）；（b）3,000个冠状病毒刺突序列（以显示对更多基因组可变性的稳健性）；以及（c）从鼻拭子PCR测试中采集的4K个原始全基因组序列读数集（以显示处理未组装读取的能力）。我们的结果表明，ViralVectors在大多数分类和聚类任务中优于当前基准。

0

相关内容

Nat.Mach.Intell | ImageMol: 精准预测分子性质和药物靶标的自监督学习框架

Nat.Mach.Intell | ImageMol: 精准预测分子性质和药物靶标的自监督学习框架

专知会员服务

8+阅读 · 2022年11月21日

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

NLP新范式-预训练，提示(Prompt)，预测！CMU刘鹏飞等论文综述预训练语言模型提示学习进展

NLP新范式-预训练，提示(Prompt)，预测！CMU刘鹏飞等论文综述预训练语言模型提示学习进展

专知会员服务

71+阅读 · 2021年7月31日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【ECML-PKDD 2019】二部图中通过社区发现算法进行链接预测（Link Prediction via Community Detection inBipartite Multi-Layer Graphs）

【ECML-PKDD 2019】二部图中通过社区发现算法进行链接预测（Link Prediction via Community Detection inBipartite Multi-Layer Graphs）

专知会员服务

34+阅读 · 2019年12月3日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

基于分子进化的蛋白质共进化高维互信息模型

国家自然科学基金

4+阅读 · 2015年12月31日

以PI4KIIα为靶点抗肿瘤抑制剂的筛选及优化

国家自然科学基金

0+阅读 · 2014年12月31日

潘多拉菌中氯苯代谢的两个基因簇的转录调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

马疱疹病毒1型(EHV-1)神经致病因子UL24转录调控分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

腺病毒介导的RNA干扰策略抗流感病毒感染

国家自然科学基金

0+阅读 · 2012年12月31日

新型糖尿病动物模型—2型糖尿病树鼩模型创建初探

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

全基因组甲基化CpG岛扩增技术的建立及在食管癌早期诊断中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

新型聚阳离子脂质体基因载体的设计、合成与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

Sound Design Strategies for Latent Audio Space Explorations Using Deep Learning Architectures

Arxiv

0+阅读 · 2023年5月24日

Improved prediction of hiking speeds using a data driven approach

Arxiv

0+阅读 · 2023年5月24日

Madvex: Instrumentation-based Adversarial Attacks on Machine Learning Malware Detection

Arxiv

0+阅读 · 2023年5月24日

Linear Dimensionality Reduction

Arxiv

0+阅读 · 2023年5月24日

MISO: Legacy-compatible Privacy-preserving Single Sign-on using Trusted Execution Environments

Arxiv

0+阅读 · 2023年5月23日

Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction

Arxiv

0+阅读 · 2023年5月23日

A Case of Exponential Convergence Rates for SVM

Arxiv

0+阅读 · 2023年5月22日

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Arxiv

0+阅读 · 2023年5月22日

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

Arxiv

0+阅读 · 2023年5月22日

VIP会员

文章信息

相关主题

相关VIP内容

Nat.Mach.Intell | ImageMol: 精准预测分子性质和药物靶标的自监督学习框架

Nat.Mach.Intell | ImageMol: 精准预测分子性质和药物靶标的自监督学习框架

专知会员服务

8+阅读 · 2022年11月21日

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

不可错过！杜克大学《因果推断》课程，全面讲述因果推理

专知会员服务

52+阅读 · 2022年10月22日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

NLP新范式-预训练，提示(Prompt)，预测！CMU刘鹏飞等论文综述预训练语言模型提示学习进展

NLP新范式-预训练，提示(Prompt)，预测！CMU刘鹏飞等论文综述预训练语言模型提示学习进展

专知会员服务

71+阅读 · 2021年7月31日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

24+阅读 · 2020年3月31日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【ECML-PKDD 2019】二部图中通过社区发现算法进行链接预测（Link Prediction via Community Detection inBipartite Multi-Layer Graphs）

【ECML-PKDD 2019】二部图中通过社区发现算法进行链接预测（Link Prediction via Community Detection inBipartite Multi-Layer Graphs）

专知会员服务

34+阅读 · 2019年12月3日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《“蛛网”行动：乌克兰不对称作战的演进》报告

美国启动“自有军事人工智能计划”：采用谷歌Gemini以推动全军人工智能应用

《解析陆域作战方向：一个概念性框架》报告

《人工智能与人类的未来》2025年最新300页书籍

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

LibRec 精选：推荐系统的常用数据集

LibRec 精选：推荐系统的常用数据集

LibRec智能推荐

17+阅读 · 2019年2月15日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

LibRec 精选：推荐的可解释性[综述]

LibRec 精选：推荐的可解释性[综述]

LibRec智能推荐

10+阅读 · 2018年5月4日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Lucy-SKG: Learning to Play Rocket League Efficiently Using Deep Reinforcement Learning

Arxiv

0+阅读 · 2023年5月25日

Sound Design Strategies for Latent Audio Space Explorations Using Deep Learning Architectures

Arxiv

0+阅读 · 2023年5月24日

Improved prediction of hiking speeds using a data driven approach

Arxiv

0+阅读 · 2023年5月24日

Madvex: Instrumentation-based Adversarial Attacks on Machine Learning Malware Detection

Arxiv

0+阅读 · 2023年5月24日

Linear Dimensionality Reduction

Arxiv

0+阅读 · 2023年5月24日

MISO: Legacy-compatible Privacy-preserving Single Sign-on using Trusted Execution Environments

Arxiv

0+阅读 · 2023年5月23日

Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction

Arxiv

0+阅读 · 2023年5月23日

A Case of Exponential Convergence Rates for SVM

Arxiv

0+阅读 · 2023年5月22日

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Arxiv

0+阅读 · 2023年5月22日

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

Arxiv

0+阅读 · 2023年5月22日

相关基金

基于分子进化的蛋白质共进化高维互信息模型

国家自然科学基金

4+阅读 · 2015年12月31日

以PI4KIIα为靶点抗肿瘤抑制剂的筛选及优化

国家自然科学基金

0+阅读 · 2014年12月31日

潘多拉菌中氯苯代谢的两个基因簇的转录调控研究

国家自然科学基金

0+阅读 · 2013年12月31日

马疱疹病毒1型(EHV-1)神经致病因子UL24转录调控分子机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

腺病毒介导的RNA干扰策略抗流感病毒感染

国家自然科学基金

0+阅读 · 2012年12月31日

新型糖尿病动物模型—2型糖尿病树鼩模型创建初探

国家自然科学基金

0+阅读 · 2012年12月31日

Dicer在慢性乙型病毒性肝炎恶性转化过程中的作用

国家自然科学基金

0+阅读 · 2011年12月31日

全基因组甲基化CpG岛扩增技术的建立及在食管癌早期诊断中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

新型聚阳离子脂质体基因载体的设计、合成与性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员