Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including topic modeling, recommender systems and bioinformatics. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed in the past. However, existing parallel NMF algorithms have not addressed data locality optimizations, which are critical for high performance since data movement costs greatly exceed the cost of arithmetic/logic operations on current computer systems. In this paper, we devise a parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality. Efficient realizations of the algorithm on multi-core CPUs and GPUs are developed, demonstrating significant performance improvement over existing state-of-the-art parallel NMF algorithms.
翻译:非负式矩阵系数(NMF)是广泛应用,包括专题模型、推荐系统和生物信息学等应用中使用的不受监督的减少维度的关键核心。由于必须反复执行NMF的应用程序的计算密集性,过去已经开发了一些平行的实施。但是,现有的平行NMF算法没有处理数据地点优化问题,这对于高性能至关重要,因为数据移动成本大大超过当前计算机系统的算术/逻辑操作成本。在本文中,我们根据HALS(高等级最小方程)设计了一个平行的NMF算法计划,纳入算法转换,以加强数据位置。发展了多核心CPU和GPU的算法的有效实现,表明现有最先进的平行NMF算法的绩效显著改善。