关于UMAP的真实损失功能 (On UMAP's true loss function)

UMAP has supplanted t-SNE as state-of-the-art for visualizing high-dimensional datasets in many disciplines, but the reason for its success is not well understood. In this work, we investigate UMAP's sampling based optimization scheme in detail. We derive UMAP's effective loss function in closed form and find that it differs from the published one. As a consequence, we show that UMAP does not aim to reproduce its theoretically motivated high-dimensional UMAP similarities. Instead, it tries to reproduce similarities that only encode the shared $k$ nearest neighbor graph, thereby challenging the previous understanding of UMAP's effectiveness. Instead, we claim that the key to UMAP's success is its implicit balancing of attraction and repulsion resulting from negative sampling. This balancing in turn facilitates optimization via gradient descent. We corroborate our theoretical findings on toy and single cell RNA sequencing data.

翻译：UMAP已经取代了t-SNE, 将其作为在许多学科中直观高维数据集的最新技术, 但成功的原因却不十分清楚。在这项工作中,我们详细调查UMAP基于抽样的优化计划。我们以封闭的形式得出UMAP的有效损失功能,发现其与所公布的功能不同。因此,我们表明 UMAP无意复制其具有理论动机的高维UMAP相似之处。相反,它试图复制相似之处,这些相似之处只是编码了共享的近邻美元图,从而挑战了UMAP以前对UMAP有效性的理解。相反,我们声称, UMAP成功的关键在于它从负抽样中暗含的吸引力和反作用的平衡。这反过来又有助于通过梯度下降实现优化。我们证实了我们关于玩具和单细胞RNA测序数据的理论结论。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【AAAI2021】可解释图胶囊网络物体检测

专知会员服务

29+阅读 · 2021年1月4日

【AAAI2021】层次图胶囊网络

专知会员服务

84+阅读 · 2020年12月18日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

TensorFlow官方开源的神经结构学习（Neural Structured Learning）库

专知会员服务

18+阅读 · 2019年10月18日