面包屑推理：基于压缩信标的高效内存推理方法 (Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons) - 专知论文

会员服务 ·

0

内存 · 蒸馏 · 强化学习 · 上下文推理 · 扩展性 ·

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

翻译：面包屑推理：基于压缩信标的高效内存推理方法

Giovanni Monea,Yair Feldman,Shankar Padmanabhan,Kianté Brantley,Yoav Artzi

The scalability of large language models for long-context reasoning is severely constrained by the linear growth of their Transformer key-value cache, which incurs significant memory and computational costs. We posit that as a model generates reasoning tokens, the informational value of past generated tokens diminishes, creating an opportunity for compression. In this work, we propose to periodically compress the generation KV cache with a learned, special-purpose token and evict compressed entries. We train the model to perform this compression via a modified joint distillation and reinforcement learning (RL) framework. Our training method minimizes overhead over the conventional RL process, as it leverages RL outputs for distillation. Empirically, our method achieves a superior memory-accuracy Pareto frontier compared to both the model without cache compression and training-free compression techniques.

翻译：大型语言模型在长上下文推理中的可扩展性受到其Transformer键值缓存线性增长的严重制约，这带来了巨大的内存和计算开销。我们认为，随着模型生成推理标记，过去已生成标记的信息价值会逐渐降低，从而为压缩创造了机会。在本研究中，我们提出定期使用一种经过学习的专用标记对生成过程中的键值缓存进行压缩，并清除已压缩的条目。我们通过改进的联合蒸馏与强化学习框架训练模型执行这种压缩操作。该训练方法利用强化学习的输出进行蒸馏，从而在传统强化学习流程基础上实现了最小化的额外开销。实验结果表明，相较于未进行缓存压缩的模型及无需训练的压缩技术，我们的方法在内存-准确率帕累托边界上达到了更优的性能。

0

相关内容

[ICML2024]消除偏差：微调基础模型以进行半监督学习

[ICML2024]消除偏差：微调基础模型以进行半监督学习

专知会员服务

17+阅读 · 2024年5月23日

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

专知会员服务

17+阅读 · 2024年3月4日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

反问题的数学建模、计算及应用

国家自然科学基金

2+阅读 · 2014年12月31日

AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards

Arxiv

0+阅读 · 12月23日

PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design

Arxiv

0+阅读 · 12月21日

Mitigating Spurious Correlations in NLI via LLM-Synthesized Counterfactuals and Dynamic Balanced Sampling

Arxiv

0+阅读 · 12月20日

Sparse Anomaly Detection Across Referentials: A Rank-Based Higher Criticism Approach

Arxiv

0+阅读 · 12月19日

Language Self-Play For Data-Free Training

Arxiv

0+阅读 · 12月19日

VIP会员

文章信息

相关主题

上下文推理

相关VIP内容

[ICML2024]消除偏差：微调基础模型以进行半监督学习

[ICML2024]消除偏差：微调基础模型以进行半监督学习

专知会员服务

17+阅读 · 2024年5月23日

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

【CVPR2024】VideoMAC: 视频掩码自编码器与卷积神经网络

专知会员服务

17+阅读 · 2024年3月4日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

【ACL2020-CMU-Google】MobileBERT:用于资源受限设备的任务无关“瘦版”BERT

专知会员服务

13+阅读 · 2020年4月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《全球地缘政治环境中的反无人机系统互操作性》252页

《美国：为自动驾驶汽车铺平道路——未来出行已来》最新43页报告

基于大语言模型的智能体化软件问题解决：综述

星链与未来战争

相关资讯

【CVPR 2020 Oral】小样本类增量学习

【CVPR 2020 Oral】小样本类增量学习

专知

20+阅读 · 2020年6月26日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

相关论文

AWPO: Enhancing Tool-Use of Large Language Models through Explicit Integration of Reasoning Rewards

Arxiv

0+阅读 · 12月23日

PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design

Arxiv

0+阅读 · 12月21日

Mitigating Spurious Correlations in NLI via LLM-Synthesized Counterfactuals and Dynamic Balanced Sampling

Arxiv

0+阅读 · 12月20日

Sparse Anomaly Detection Across Referentials: A Rank-Based Higher Criticism Approach

Arxiv

0+阅读 · 12月19日

Language Self-Play For Data-Free Training

Arxiv

0+阅读 · 12月19日

相关基金

T-S模糊神经网络的容错同步性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

反问题的数学建模、计算及应用

国家自然科学基金

2+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员