Twilight：基于分层Top-$p$剪枝的自适应注意力稀疏化方法 (Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning) - 专知论文

会员服务 ·

0

稀疏 · 自适应 · 剪枝 · 算法 · 分层 ·

Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning

翻译：Twilight：基于分层Top-$p$剪枝的自适应注意力稀疏化方法

Chaofan Lin,Jiaming Tang,Shuo Yang,Hanshuo Wang,Tian Tang,Boyu Tian,Ion Stoica,Song Han,Mingyu Gao

from arxiv, To appear on NeurIPS 2025 (spotlight)

Leveraging attention sparsity to accelerate long-context large language models (LLMs) has been a hot research topic. However, current algorithms such as sparse attention or key-value (KV) cache compression tend to use a fixed budget, which presents a significant challenge during deployment because it fails to account for the dynamic nature of real-world scenarios, where the optimal balance between accuracy and efficiency can vary greatly. In this paper, we find that borrowing top-$p$ sampling (nucleus sampling) to sparse attention can surprisingly achieve adaptive budgeting. Based on this, we propose Twilight, a framework to bring adaptive sparsity to any existing sparse attention algorithm without sacrificing their accuracy. Empirical results show that Twilight can adaptively prune at most 98% of redundant tokens, leading to $15.4\times$ acceleration in self-attention operations and $3.9\times$ acceleration in end-to-end per token latency in long context LLM decoding.

翻译：利用注意力稀疏性加速长上下文大语言模型（LLMs）已成为研究热点。然而，现有算法如稀疏注意力或键值（KV）缓存压缩通常采用固定预算，这在部署时面临显著挑战，因其未能考虑现实场景的动态特性——其中精度与效率的最优平衡可能存在巨大差异。本文发现，将Top-$p$采样（核采样）思想引入稀疏注意力可意外实现自适应预算分配。基于此，我们提出Twilight框架，该框架能在不牺牲现有稀疏注意力算法精度的前提下，为任意算法引入自适应稀疏性。实验结果表明，Twilight能自适应地剪枝最多98%的冗余标记，在长上下文LLM解码中实现自注意力操作$15.4\\times$的加速以及端到端每标记延迟$3.9\\times$的加速。

0

相关内容

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

专知会员服务

12+阅读 · 7月28日

【ICML2025】大语言模型的有限理性：推理时的“满意化”对齐策略

【ICML2025】大语言模型的有限理性：推理时的“满意化”对齐策略

专知会员服务

11+阅读 · 6月1日

【CVPR2023】用于无监督域适应的Patch-Mix Transformer: 博弈视角

【CVPR2023】用于无监督域适应的Patch-Mix Transformer: 博弈视角

专知会员服务

30+阅读 · 2023年3月27日

【ICCV2021】模态视频表示的跨模态对比学习

专知会员服务

16+阅读 · 2021年10月4日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

图节点嵌入(Node Embeddings)概述，9页pdf

图节点嵌入(Node Embeddings)概述，9页pdf

专知

15+阅读 · 2020年8月22日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

半线性广义Tricomi方程Cauchy问题解的生命跨度估计研究

国家自然科学基金

0+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

LoopBench: Discovering Emergent Symmetry Breaking Strategies with LLM Swarms

Arxiv

0+阅读 · 12月7日

BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

Arxiv

0+阅读 · 11月27日

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Arxiv

0+阅读 · 11月24日

Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers

Arxiv

0+阅读 · 11月18日

TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training

Arxiv

0+阅读 · 11月12日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

【ICML2025】免费的Fisher？通过回收平方梯度累加器近似Fisher信息矩阵

专知会员服务

12+阅读 · 7月28日

【ICML2025】大语言模型的有限理性：推理时的“满意化”对齐策略

【ICML2025】大语言模型的有限理性：推理时的“满意化”对齐策略

专知会员服务

11+阅读 · 6月1日

【CVPR2023】用于无监督域适应的Patch-Mix Transformer: 博弈视角

【CVPR2023】用于无监督域适应的Patch-Mix Transformer: 博弈视角

专知会员服务

30+阅读 · 2023年3月27日

【ICCV2021】模态视频表示的跨模态对比学习

专知会员服务

16+阅读 · 2021年10月4日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

字节跳动李航提出AMBERT！超越BERT！多粒度token预训练语言模型

专知

18+阅读 · 2020年8月31日

图节点嵌入(Node Embeddings)概述，9页pdf

图节点嵌入(Node Embeddings)概述，9页pdf

专知

15+阅读 · 2020年8月22日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

相关论文

LoopBench: Discovering Emergent Symmetry Breaking Strategies with LLM Swarms

Arxiv

0+阅读 · 12月7日

BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning

Arxiv

0+阅读 · 11月27日

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Arxiv

0+阅读 · 11月24日

Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers

Arxiv

0+阅读 · 11月18日

TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training

Arxiv

0+阅读 · 11月12日

相关基金

半线性广义Tricomi方程Cauchy问题解的生命跨度估计研究

国家自然科学基金

0+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员