粒子系统随机批处理方法如何增强图Transformer：内存效率与并行计算策略 (How Particle-System Random Batch Methods Enhance Graph Transformer: Memory Efficiency and Parallel Computing Strategy)

Attention mechanism is a significant part of Transformer models. It helps extract features from embedded vectors by adding global information and its expressivity has been proved to be powerful. Nevertheless, the quadratic complexity restricts its practicability. Although several researches have provided attention mechanism in sparse form, they are lack of theoretical analysis about the expressivity of their mechanism while reducing complexity. In this paper, we put forward Random Batch Attention (RBA), a linear self-attention mechanism, which has theoretical support of the ability to maintain its expressivity. Random Batch Attention has several significant strengths as follows: (1) Random Batch Attention has linear time complexity. Other than this, it can be implemented in parallel on a new dimension, which contributes to much memory saving. (2) Random Batch Attention mechanism can improve most of the existing models by replacing their attention mechanisms, even many previously improved attention mechanisms. (3) Random Batch Attention mechanism has theoretical explanation in convergence, as it comes from Random Batch Methods on computation mathematics. Experiments on large graphs have proved advantages mentioned above. Also, the theoretical modeling of self-attention mechanism is a new tool for future research on attention-mechanism analysis.

翻译：注意力机制是Transformer模型的重要组成部分，它通过融入全局信息来帮助从嵌入向量中提取特征，其表达能力已被证明十分强大。然而，其二次复杂度限制了实际应用。尽管已有研究提出了稀疏形式的注意力机制，但在降低复杂度的同时，这些方法缺乏对其表达能力理论分析。本文提出随机批注意力（Random Batch Attention，RBA），一种具有线性复杂度的自注意力机制，并提供了保持其表达能力的理论支撑。随机批注意力具有以下显著优势：（1）随机批注意力具有线性时间复杂度，此外，它可在新维度上并行实现，从而显著节省内存。（2）随机批注意力机制可通过替换现有模型的注意力模块来改进多数模型，包括许多先前改进的注意力机制。（3）随机批注意力机制具有收敛性的理论解释，因其源于计算数学中的随机批处理方法。在大规模图数据上的实验验证了上述优势。同时，自注意力机制的理论建模为未来注意力机制分析研究提供了新工具。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【NeurIPS2024】超越冗余：信息感知的无监督多重图结构学习

专知会员服务

27+阅读 · 2024年9月29日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR2022】MSDN: 零样本学习的互语义蒸馏网络

专知会员服务

21+阅读 · 2022年3月8日

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

专知会员服务

33+阅读 · 2022年3月4日