The growth of machine learning (ML) workloads has underscored the importance of efficient memory hierarchies to address bandwidth, latency, and scalability challenges. HERMES focuses on optimizing memory subsystems for RISC-V architectures to meet the computational needs of ML models such as CNNs, RNNs, and Transformers. This project explores state-of-the-art techniques such as advanced prefetching, tensor-aware caching, and hybrid memory models. The cornerstone of HERMES is the integration of shared L3 caches with fine-grained coherence protocols and specialized pathways to deep learning accelerators like Gemmini. Simulation tools like Gem5 and DRAMSim2 are used to evaluate baseline performance and scalability under representative ML workloads. The findings of this study highlight the design choices and anticipated challenges, paving the way for low-latency scalable memory operations for ML applications.
翻译:暂无翻译