t-SNE is a popular tool for embedding multi-dimensional datasets into two or three dimensions. However, it has a large computational cost, especially when the input data has many dimensions. Many use t-SNE to embed the output of a neural network, which is generally of much lower dimension than the original data. This limits the use of t-SNE in unsupervised scenarios. We propose using \textit{random} projections to embed high dimensional datasets into relatively few dimensions, and then using t-SNE to obtain a two dimensional embedding. We show that random projections preserve the desirable clustering achieved by t-SNE, while dramatically reducing the runtime of finding the embedding.
翻译:t-SNE是将多维数据集嵌入两个或三个维维的常用工具。 但是,它有很高的计算成本, 特别是在输入数据具有多个维度的情况下。 许多人使用 t- SNE 嵌入神经网络的输出, 神经网络的输出一般比原始数据低得多。 这限制了 t- SNE 在不受监督的假设情景中的使用。 我们提议使用\ textit{random} 预测将高维数据集嵌入相对较少的维度, 然后使用 t- SNE 获取两个维嵌入。 我们显示随机预测保留 t- SNE 所实现的适当组合, 同时大大缩短了找到嵌入的运行时间 。