Single-cell RNA-seq data allow the quantification of cell type differences across a growing set of biological contexts. However, pinpointing a small subset of genomic features explaining this variability can be ill-defined and computationally intractable. Here we introduce MarkerMap, a generative model for selecting minimal gene sets which are maximally informative of cell type origin and enable whole transcriptome reconstruction. MarkerMap provides a scalable framework for both supervised marker selection, aimed at identifying specific cell type populations, and unsupervised marker selection, aimed at gene expression imputation and reconstruction. We benchmark MarkerMap's competitive performance against previously published approaches on real single cell gene expression data sets. MarkerMap is available as a pip installable package, as a community resource aimed at developing explainable machine learning techniques for enhancing interpretability in single-cell studies.
翻译:单细胞 RNA - seq 数据允许对一组不断增长的生物背景中的细胞类型差异进行量化。 然而, 确定一小组基因组特征解释这种变异性可能定义不清, 且难以计算。 我们在这里引入了MarkerMap, 这是一种用于选择最小基因组的基因组的基因组的基因组的遗传模型, 这些基因组对细胞类型来源具有最大程度的丰富性,并且能够进行全过程的笔录重建。 MarkerMap 提供了一个可扩缩的框架, 用于监督的标记选择, 目的是确定特定的细胞类型群, 以及不受监督的标记选择, 目的是进行基因表达的估算和重建。 我们用MarkerMap 的竞争性性能与以前在真实的单细胞表达数据集上公布的方法做基准。 MarkerMap 作为一种可安装的组合, 作为一种社区资源, 旨在开发可解释的机器学习技术, 以加强单细胞研究的可解释性能 。