Matrix factorization, one of the most popular methods in machine learning, has recently benefited from introducing non-linearity in prediction tasks using tropical semiring. The non-linearity enables a better fit to extreme values and distributions, thus discovering high-variance patterns that differ from those found by standard linear algebra. However, the optimization process of various tropical matrix factorization methods is slow. In our work, we propose a new method FastSTMF based on Sparse Tropical Matrix Factorization (STMF), which introduces a novel strategy for updating factor matrices that results in efficient computational performance. We evaluated the efficiency of FastSTMF on synthetic and real gene expression data from the TCGA database, and the results show that FastSTMF outperforms STMF in both accuracy and running time. Compared to NMF, we show that FastSTMF performs better on some datasets and is not prone to overfitting as NMF. This work sets the basis for developing other matrix factorization techniques based on many other semirings using a new proposed optimization process.
翻译:矩阵化是机器学习中最受欢迎的方法之一,最近得益于在使用热带半径的预测任务中引入非线性。非线性使得能够更好地适应极端值和分布,从而发现与标准线性代数不同的高差异模式。然而,各种热带矩阵化因子化方法的优化过程很慢。在我们的工作中,我们建议了一种基于粗略热带矩阵因子化(STMF)的新方法快速STMF,它为更新因子矩阵引入了新的战略,从而实现有效的计算性能。我们从TCGA数据库中评估了合成和真实基因表达数据快速STMF的效率,结果显示快速STMF在精确和运行时间上都优于STMF。与NMF相比,我们表明快速STMF在一些数据集上表现更好,而且不易过度适应为NMF。这项工作为利用新的拟议优化进程在许多其他半环的基础上开发其他矩阵化技术奠定了基础。