Parameter-efficient fine-tuning (PEFT) provides a scalable alternative to full-model adaptation by updating only a small subset of parameters in large pre-trained models. We introduce GRASP - GRouped Activation Shared Parameterization - a lightweight PEFT framework that partitions the D-dimensional token representations of selected layers into K << D groups and learns a shared scaling and shifting vector for each group. This grouped modulation reduces the number of trainable parameters significantly while preserving the ability of the model to learn task-specific features. Building on this formulation, we further propose StochGRASP, which learns Gaussian distributions as perturbations to the pre-trained weights rather than deterministic values. This probabilistic parameterization along with a noise-aware loss function formulation enables modelling hardware-level variability in programmed weights and significantly improves robustness under non-ideal inference conditions-an important requirement for deployment on edge-based emerging AI hardware. Across GLUE (RoBERTa-base & RoBERTa-large) and E2E NLG (GPT-2 Medium), GRASP matches or exceeds the performance of established PEFT methods while achieving an order of magnitude reduction in trainable parameters compared to LoRA and BitFit. Under varying levels of noise, StochGRASP consistently outperforms deterministic variants, demonstrating its suitability for energy-efficient and noise-prone hardware platforms.
翻译:参数高效微调(PEFT)通过仅更新大型预训练模型中的一小部分参数,为全模型适应提供了可扩展的替代方案。本文提出GRASP——分组激活共享参数化——一种轻量级PEFT框架,它将选定层的D维令牌表示划分为K << D个组,并为每个组学习共享的缩放与平移向量。这种分组调制在显著减少可训练参数数量的同时,保留了模型学习任务特定特征的能力。基于此框架,我们进一步提出StochGRASP,该方法学习高斯分布作为对预训练权重的扰动,而非确定性值。这种概率化参数化结合噪声感知损失函数公式,能够对编程权重中硬件层面的变异性进行建模,并显著提升非理想推理条件下的鲁棒性——这是部署于基于边缘的新兴AI硬件的重要需求。在GLUE(RoBERTa-base和RoBERTa-large)和E2E NLG(GPT-2 Medium)基准测试中,GRASP的性能达到或超越了现有PEFT方法,同时相比LoRA和BitFit实现了可训练参数数量一个数量级的减少。在不同噪声水平下,StochGRASP始终优于确定性变体,证明了其适用于高能效且易受噪声影响的硬件平台。