Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT's orthogonal stability with LoRA-GA's gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval, using models ranging from 7B to 405B parameters, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by approximately 2.1 times and peak memory usage by nearly 50 percent, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES, using 32 examples per language, further demonstrates consistent gains under the same budget with a small and stable footprint. These results indicate a practical and scalable path toward accessible LLM fine-tuning under resource constraints.
翻译:由于大型语言模型(LLM)的规模与内存需求,其微调过程仍是计算瓶颈。本文对参数高效微调(PEFT)技术进行了全面评估,包括LoRA、BOFT、LoRA-GA及uRNN,并提出一种新颖的混合策略,动态融合BOFT的正交稳定性与LoRA-GA的梯度对齐快速收敛特性。通过基于梯度范数引导的逐层自适应更新计算,该混合方法在多样化任务中实现了更优的收敛效率与泛化性能。我们首次探索将酉循环神经网络(uRNN)原理适配至基于Transformer的LLM,通过结构化酉约束增强梯度稳定性。在GLUE、GSM8K、MT-Bench和HumanEval基准上,使用参数量从7B到405B的模型进行实验,混合方法在每任务每模型三次独立运行中均取得稳定增益,其性能接近全参数微调,同时将训练时间减少约2.1倍,峰值内存使用降低近50%,表明其在资源受限条件下的实用价值。在XNLI和FLORES数据集上进行的紧凑型多语言低资源研究(每语言仅使用32个示例)进一步证明,在相同预算下该方法能以较小且稳定的计算开销获得持续性能提升。这些结果为资源受限条件下实现可访问的LLM微调提供了实用且可扩展的路径。