KQ-SVD：以可证明的注意力保真度保证压缩键值缓存 (KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity) - 专知论文

会员服务 ·

0

奇异值分解 · 低秩 · 低秩分解 · 分解 · 存储 ·

KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity

翻译：KQ-SVD：以可证明的注意力保真度保证压缩键值缓存

Damien Lesens,Beheshteh T. Rakhshan,Guillaume Rabusseau

The Key-Value (KV) cache is central to the efficiency of transformer-based large language models (LLMs), storing previously computed vectors to accelerate inference. Yet, as sequence length and batch size grow, the cache becomes a major memory bottleneck. Prior compression methods typically apply low-rank decomposition to keys alone or attempt to jointly embed queries and keys, but both approaches neglect that attention fundamentally depends on their inner products. In this work, we prove that such strategies are suboptimal for approximating the attention matrix. We introduce KQ-SVD, a simple and computationally efficient method that directly performs an optimal low-rank decomposition of the attention matrix via a closed-form solution. By targeting the true source of redundancy, KQ-SVD preserves attention outputs with higher fidelity under compression. Extensive evaluations on LLaMA and Mistral models demonstrate that our approach consistently delivers superior projection quality.

翻译：键值（KV）缓存是基于Transformer的大型语言模型（LLM）效率的核心，它存储先前计算得到的向量以加速推理。然而，随着序列长度和批处理规模的增加，缓存成为主要的内存瓶颈。现有的压缩方法通常仅对键进行低秩分解，或尝试联合嵌入查询和键，但这两种方法都忽略了注意力机制根本上依赖于它们的内积。在本工作中，我们证明了此类策略在近似注意力矩阵方面是次优的。我们提出了KQ-SVD，这是一种简单且计算高效的方法，它通过闭式解直接对注意力矩阵进行最优低秩分解。通过针对冗余的真实来源，KQ-SVD在压缩下以更高的保真度保留了注意力输出。在LLaMA和Mistral模型上的广泛评估表明，我们的方法始终能提供更优的投影质量。

0

相关内容

奇异值分解

奇异值分解

奇异值分解（Singular Value Decomposition）是线性代数中一种重要的矩阵分解，奇异值分解则是特征分解在任意矩阵上的推广。在信号处理、统计学等领域有重要应用。

【CVPR2025】基于组合表示移植的图像编辑方法

【CVPR2025】基于组合表示移植的图像编辑方法

专知会员服务

8+阅读 · 4月5日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR2020】用多样性最大化克服单样本NAS中的多模型遗忘

【CVPR2020】用多样性最大化克服单样本NAS中的多模型遗忘

专知会员服务

21+阅读 · 2020年5月16日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

Seq2seq强化，Pointer Network简介

Seq2seq强化，Pointer Network简介

机器学习算法与Python学习

15+阅读 · 2018年12月8日

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于概率计算的大规模MIMO检测方法

国家自然科学基金

1+阅读 · 2015年12月31日

基于最大相关熵准则的支持向量机模型与算法研究

国家自然科学基金

3+阅读 · 2015年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

高维复杂结构数据降维

国家自然科学基金

10+阅读 · 2014年12月31日

SAT-MapIt: A SAT-based Modulo Scheduling Mapper for Coarse Grain Reconfigurable Architectures

Arxiv

0+阅读 · 12月2日

STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning

Arxiv

0+阅读 · 11月25日

Pickle Prefetcher: Programmable and Scalable Last-Level Cache Prefetcher

Arxiv

0+阅读 · 11月25日

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Arxiv

0+阅读 · 11月24日

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

Arxiv

0+阅读 · 11月5日

VIP会员

文章信息

相关主题

奇异值分解

相关VIP内容

【CVPR2025】基于组合表示移植的图像编辑方法

【CVPR2025】基于组合表示移植的图像编辑方法

专知会员服务

8+阅读 · 4月5日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR2020】用多样性最大化克服单样本NAS中的多模型遗忘

【CVPR2020】用多样性最大化克服单样本NAS中的多模型遗忘

专知会员服务

21+阅读 · 2020年5月16日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日

热门VIP内容

开通专知VIP会员享更多权益服务

前沿人工智能趋势报告（Frontier AI Trends Report）

【AAAI2026】善始则事半功倍：基于前缀优化的大语言模型推理强化学习

Andrej Karpathy：2025 年 LLM 年度回顾（2025 LLM Year in Review）

音退化问题：基于输入操控的鲁棒语音转换综述

相关资讯

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

Python图像处理，366页pdf，Image Operators Image Processing in Python

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知

15+阅读 · 2020年7月23日

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

【阿里巴巴-WWW2020】对抗性多模态表示学习的点击率预测，Adversarial Multimodal RL

专知

11+阅读 · 2020年3月17日

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

NAACL 2019 | 一种考虑缓和KL消失的简单VAE训练方法

PaperWeekly

20+阅读 · 2019年4月24日

Seq2seq强化，Pointer Network简介

Seq2seq强化，Pointer Network简介

机器学习算法与Python学习

15+阅读 · 2018年12月8日

相关论文

SAT-MapIt: A SAT-based Modulo Scheduling Mapper for Coarse Grain Reconfigurable Architectures

Arxiv

0+阅读 · 12月2日

STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning

Arxiv

0+阅读 · 11月25日

Pickle Prefetcher: Programmable and Scalable Last-Level Cache Prefetcher

Arxiv

0+阅读 · 11月25日

Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths

Arxiv

0+阅读 · 11月24日

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

Arxiv

0+阅读 · 11月5日

相关基金

语义Web知识库补全关键技术研究

国家自然科学基金

17+阅读 · 2017年12月31日

基于概率计算的大规模MIMO检测方法

国家自然科学基金

1+阅读 · 2015年12月31日

基于最大相关熵准则的支持向量机模型与算法研究

国家自然科学基金

3+阅读 · 2015年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

高维复杂结构数据降维

国家自然科学基金

10+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员