The rapid expansion of mass spectrometry (MS) data, now exceeding hundreds of terabytes, poses significant challenges for efficient, large-scale library search - a critical component for drug discovery. Traditional processors struggle to handle this data volume efficiently, making in-storage computing (ISP) a promising alternative. This work introduces an ISP architecture leveraging a 3D Ferroelectric NAND (FeNAND) structure, providing significantly higher density, faster speeds, and lower voltage requirements compared to traditional NAND flash. Despite its superior density, the NAND structure has not been widely utilized in ISP applications due to limited throughput associated with row-by-row reads from serially connected cells. To overcome these limitations, we integrate hyperdimensional computing (HDC), a brain-inspired paradigm that enables highly parallel processing with simple operations and strong error tolerance. By combining HDC with the proposed dual-bound approximate matching (D-BAM) distance metric, tailored to the FeNAND structure, we parallelize vector computations to enable efficient MS spectral library search, achieving 43x speedup and 21x higher energy efficiency over state-of-the-art 3D NAND methods, while maintaining comparable accuracy.
翻译:随着质谱(MS)数据规模迅速扩大至数百太字节以上,高效的大规模谱库搜索——这一药物发现的关键环节——面临重大挑战。传统处理器难以高效处理如此庞大的数据量,使得存内计算(ISP)成为一种极具前景的替代方案。本研究提出一种基于三维铁电NAND(FeNAND)结构的存内计算架构,与传统NAND闪存相比,该结构具有显著更高的密度、更快的速度以及更低的电压需求。尽管其密度优势突出,但由于串行连接存储单元需逐行读取而导致吞吐量受限,NAND结构在存内计算应用中尚未得到广泛采用。为突破这些限制,我们引入了超维度计算(HDC)——一种受大脑启发的计算范式,能够通过简单运算实现高度并行处理并具备强大的容错能力。通过将HDC与针对FeNAND结构设计的双边界近似匹配(D-BAM)距离度量方法相结合,我们实现了向量计算的并行化,从而支持高效的质谱谱库搜索。相较于最先进的三维NAND方案,本方法在保持相当精度的同时,实现了43倍的加速和21倍的能效提升。