Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.
翻译:Transformer模型因注意力机制中查询与键之间密集相似度计算带来的二次方开销而面临可扩展性挑战。本文提出CAMformer,一种新颖的加速器,将注意力重新诠释为关联记忆操作,并利用电压域二进制注意力内容可寻址存储器(BA-CAM)计算注意力分数。该方法通过模拟电荷共享实现恒定时间相似度搜索,以物理相似度感知替代数字算术运算。CAMformer集成了分层两阶段top-k过滤、流水线执行和高精度上下文化技术,在保证算法精度的同时提升架构效率。在BERT和Vision Transformer工作负载上的评估表明,相较于最先进的加速器,CAMformer能实现超过10倍的能效提升、最高4倍的吞吐量增长以及6-8倍的面积缩减,同时保持近乎无损的精度。