Nonlinear dimensionality reduction methods are a popular tool for data scientists and researchers to visualize complex, high dimensional data. However, while these methods continue to improve and grow in number, it is often difficult to evaluate the quality of a visualization due to a variety of factors such as lack of information about the intrinsic dimension of the data and additional tuning required for many evaluation metrics. In this paper, we seek to provide a systematic comparison of dimensionality reduction quality metrics using datasets where we know the ground truth manifold. We utilize each metric for hyperparameter optimization in popular dimensionality reduction methods used for visualization and provide quantitative metrics to objectively compare visualizations to their original manifold. In our results, we find a few methods that appear to consistently do well and propose the best performer as a benchmark for evaluating dimensionality reduction based visualizations.
翻译:减少非线性维度的方法是数据科学家和研究人员对复杂、高维度数据进行可视化的常用工具,然而,虽然这些方法在继续改进和增加数量,但由于缺乏关于数据内在层面的信息和许多评价指标所需的额外调整等各种因素,往往难以评价可视化的质量。在本文中,我们力求利用我们了解地面事实的数据集,系统比较降低维度质量的量度指标。我们利用每种指标,在用于可视化的减少大众维度方法中进行超光度优化,并提供量化指标,客观地比较可视化与原始的方位。我们发现,有少数方法似乎一贯地很好,并提出最佳性能作为评价减少以可视化为基础的维度的基准。