We present a governance-aware hybrid fine-tuning framework for multilingual, low-resource adaptation of large language models. The core algorithm combines gradient-aligned low-rank updates with structured orthogonal transformations through layer-wise mixing and introduces unitary constraints in selected sub-layers to stabilize deep optimization. In tandem with lightweight, label-free data governance steps, including language identification, near-duplicate removal, and quality filtering, the framework targets accuracy, calibration, and cross-language parity under tight compute budgets. Across XNLI and FLORES, the hybrid approach delivers consistent gains over strong PEFT baselines while maintaining directional balance and improving probability calibration, as shown in Tables II and III. It is more resilient to lightweight orthographic variants, as shown in Table IV, and benefits additively from simple governance steps, as shown in Table V. Training footprint measurements indicate modest overhead and a favorable cost-quality frontier, as shown in Table VI and Figure 2. Together, these results show that hybrid and unitary PEFT provide a stable and accessible path to resource-efficient multilingual adaptation when paired with practical data governance.
翻译:我们提出了一种治理感知的混合微调框架,用于大语言模型的多语言、低资源适应。核心算法通过分层混合,将梯度对齐的低秩更新与结构化正交变换相结合,并在选定的子层中引入酉约束以稳定深度优化。该框架与轻量级、无标签的数据治理步骤(包括语言识别、近重复去除和质量过滤)协同工作,旨在有限的计算预算下实现准确性、校准性和跨语言均衡性。在XNLI和FLORES数据集上的实验表明,如表格II和III所示,混合方法相较于强大的参数高效微调基线模型取得了持续的性能提升,同时保持了方向平衡并改善了概率校准。如表格IV所示,该方法对轻量级拼写变体更具鲁棒性;如表格V所示,简单的治理步骤能带来叠加增益。训练足迹测量表明,如表格VI和图2所示,该方法具有适度的开销和有利的成本-质量边界。综上所述,这些结果表明,当与实用的数据治理相结合时,混合与酉约束的参数高效微调为资源高效的多语言适应提供了一条稳定且易于实现的路径。