This paper presents SynapticCore-X, a modular and resource-efficient neural processing architecture optimized for deployment on low-cost FPGA platforms. The design integrates a lightweight RV32IMC RISC-V control core with a configurable neural compute tile that supports fused matrix, activation, and data-movement operations. Unlike existing FPGA accelerators that rely on heavyweight IP blocks, SynapticCore-X provides a fully open-source SystemVerilog microarchitecture with tunable parallelism, scratchpad memory depth, and DMA burst behavior, enabling rapid exploration of hardware-software co-design trade-offs. We document an automated, reproducible Vivado build pipeline that achieves timing closure at 100 MHz on the Zynq-7020 while consuming only 6.1% LUTs, 32.5% DSPs, and 21.4% BRAMs. Hardware validation on PYNQ-Z2 confirms correct register-level execution, deterministic control-path behavior, and cycle-accurate performance for matrix and convolution kernels. SynapticCore-X demonstrates that energy-efficient NPU-like acceleration can be prototyped on commodity educational FPGAs, lowering the entry barrier for academic and open-hardware research in neural microarchitectures.
翻译:本文提出SynapticCore-X,一种模块化且资源高效的神经处理架构,专为低成本FPGA平台部署优化。该设计集成了轻量级RV32IMC RISC-V控制核心与可配置的神经计算单元,支持融合矩阵运算、激活函数及数据移动操作。与依赖重型IP核的现有FPGA加速器不同,SynapticCore-X提供完全开源的SystemVerilog微架构,具备可调并行度、暂存存储器深度及DMA突发传输行为,支持快速探索软硬件协同设计的权衡空间。我们构建了自动化、可复现的Vivado编译流程,在Zynq-7020平台上实现100MHz时序收敛,仅消耗6.1% LUT、32.5% DSP及21.4% BRAM资源。基于PYNQ-Z2的硬件验证确认了寄存器级执行正确性、确定性控制路径行为,以及矩阵与卷积核的周期精确性能。SynapticCore-X证明类NPU能效加速器可在商用教育级FPGA上实现原型设计,为神经微架构的学术研究与开放硬件开发降低了入门门槛。