线性化最优传输在高维点云与单细胞数据分析中的应用 (Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data)

Single-cell technologies generate high-dimensional point clouds of cells, enabling detailed characterization of complex patient states and treatment responses. Yet each patient is represented by an irregular point cloud rather than a simple vector, making it difficult to directly quantify and compare biological differences between individuals. Nonlinear methods such as kernels and neural networks achieve predictive accuracy but act as black boxes, offering little biological interpretability. To address these limitations, we adapt the Linear Optimal Transport (LOT) framework to this setting, embedding irregular point clouds into a fixed-dimensional Euclidean space while preserving distributional structure. This embedding provides a principled linear representation that preserves optimal transport geometry while enabling downstream analysis. It also forms a registration between any two patients, enabling direct comparison of their cellular distributions. Within this space, LOT enables: (i) \textbf{accurate and interpretable classification} of COVID-19 patient states, where classifier weights map back to specific markers and spatial regions driving predictions; and (ii) \textbf{synthetic data generation} for patient-derived organoids, exploiting the linearity of the LOT embedding. LOT barycenters yield averaged cellular profiles representing combined conditions or samples, supporting drug interaction testing. Together, these results establish LOT as a unified framework that bridges predictive performance, interpretability, and generative modeling. By transforming heterogeneous point clouds into structured embeddings directly traceable to the original data, LOT opens new opportunities for understanding immune variation and treatment effects in high-dimensional biological systems.

翻译：单细胞技术生成细胞的高维点云，能够详细表征复杂的患者状态与治疗反应。然而，每个患者由不规则的点云而非简单向量表示，这导致直接量化和比较个体间的生物学差异变得困难。核方法及神经网络等非线性方法虽能实现预测准确性，但作为黑箱模型，其生物学可解释性有限。为应对这些局限，我们将线性最优传输框架适配于此场景，将不规则点云嵌入固定维度的欧几里得空间，同时保留分布结构。该嵌入提供了一种保持最优传输几何特性的原则性线性表示，并支持下游分析。它还在任意两名患者间形成配准，实现其细胞分布的直接比较。在此空间中，LOT框架能够实现：（i）对COVID-19患者状态的**精准且可解释的分类**，其中分类器权重可映射回驱动预测的特定标记物与空间区域；（ii）利用LOT嵌入的线性特性，为患者来源类器官进行**合成数据生成**。LOT重心可产生代表组合条件或样本的平均细胞谱，支持药物相互作用测试。综上，这些结果确立了LOT作为一个统一框架，能够桥接预测性能、可解释性与生成建模。通过将异质点云转化为可直接追溯至原始数据的结构化嵌入，LOT为理解高维生物系统中的免疫变异与治疗效果开辟了新途径。