Sparse matrix-vector multiplication is often employed in many data-analytic workloads in which low latency and high throughput are more valuable than exact numerical convergence. FPGAs provide quick execution times while offering precise control over the accuracy of the results thanks to reduced-precision fixed-point arithmetic. In this work, we propose a novel streaming implementation of Coordinate Format (COO) sparse matrix-vector multiplication, and study its effectiveness when applied to the Personalized PageRank algorithm, a common building block of recommender systems in e-commerce websites and social networks. Our implementation achieves speedups up to 6x over a reference floating-point FPGA architecture and a state-of-the-art multi-threaded CPU implementation on 8 different data-sets, while preserving the numerical fidelity of the results and reaching up to 42x higher energy efficiency compared to the CPU implementation.
翻译:在很多数据分析工作量中,低潜值和高吞吐量比精确的数字趋同更有价值。 FPGAs 提供快速执行时间,同时通过降低精度固定点算法对结果的准确性提供精确控制。在这项工作中,我们提议以新颖的方式实施协调格式(COO)的分散矩阵矢量乘数,并在应用个人化PageRank算法时研究其有效性,该算法是电子商务网站和社会网络中推荐系统的一个共同构件。我们的实施在参考浮点FPGA结构上和在8个不同数据集上最先进的多读化CPU实施方面,实现了6x的加速,同时保持了结果的数字对等性,并且达到了与CPU实施相比的42x更高的能源效率。