Dimensionality reduction methods, also known as projections, are frequently used for exploring multidimensional data in machine learning, data science, and information visualization. Among these, t-SNE and its variants have become very popular for their ability to visually separate distinct data clusters. However, such methods are computationally expensive for large datasets, suffer from stability problems, and cannot directly handle out-of-sample data. We propose a learning approach to construct such projections. We train a deep neural network based on a collection of samples from a given data universe, and their corresponding projections, and next use the network to infer projections of data from the same, or similar, universes. Our approach generates projections with similar characteristics as the learned ones, is computationally two to three orders of magnitude faster than SNE-class methods, has no complex-to-set user parameters, handles out-of-sample data in a stable manner, and can be used to learn any projection technique. We demonstrate our proposal on several real-world high dimensional datasets from machine learning.
翻译:在机器学习、数据科学和信息可视化中,人们经常使用减少尺寸的方法(又称预测)来探索多维数据,其中,t-SNE及其变体由于能够将不同的数据组进行视觉分离而变得非常受欢迎。然而,这些方法对于大型数据集计算费用昂贵,具有稳定性问题,无法直接处理抽样数据。我们建议采用学习方法来构建这种预测。我们根据从特定数据宇宙收集样本及其相应的预测来培训深神经网络,然后利用网络来推断从相同或类似的宇宙中预测数据的情况。我们的方法产生的预测与所学的数据相类似,在计算上比SNE类方法快二至三级,没有复杂到定的用户参数,能够以稳定的方式处理外抽样数据,并且可以用来学习任何预测技术。我们关于从机器学习的数个真实世界高维数据集的建议。