This position paper challenges the "scaling fundamentalism" dominating AI research, where unbounded growth in model size and computation has led to unsustainable environmental impacts and widening resource inequality. We argue that LLM development should be fundamentally reoriented toward capability-per-resource rather than capability alone. We present a theoretical framework demonstrating that resource-allocation decisions guided by gradient influence patterns can dramatically improve efficiency throughout the AI lifecycle. Our analysis shows that in transformer-based models, where a small fraction of parameters exert outsized influence (following heavy-tailed distributions), three critical insights emerge: (1) updating only high-influence parameters strictly outperforms full-parameter tuning on a performance-per-resource basis; (2) simple gradient norms provide computationally efficient proxies for identifying these high-influence components; and (3) coordinated parameter and data selection yields multiplicative efficiency gains, potentially reducing resource requirements by orders of magnitude. Building on these theoretical foundations, we propose a two stage paradigm marginal-return pretraining for foundation developers and influence guided adaptation for downstream users bridged by gradient blueprints, metadata describing which parameters matter most for various tasks. This capability-per-resource perspective transforms what were once considered pragmatic hardware workarounds into theoretically optimal strategies, democratizing access to cutting-edge AI capabilities while significantly reducing environmental impact. By embedding resource consciousness into how we develop, adapt, and evaluate models, we can reshape AI progress toward a more sustainable and equitable future.
翻译:本立场论文挑战当前主导AI研究的“规模至上主义”——模型尺寸与计算量的无限增长已导致不可持续的环境影响与日益加剧的资源不平等。我们主张,大语言模型的发展应从根本上转向以单位资源能力而非单纯能力为衡量核心。我们提出一个理论框架,证明基于梯度影响模式的资源分配决策能显著提升AI全生命周期的效率。分析表明,在基于Transformer的模型中,少量参数(服从重尾分布)具有超常影响力,由此可得出三个关键结论:(1)仅更新高影响力参数在单位资源性能上严格优于全参数调优;(2)简单的梯度范数为识别这些高影响力组件提供了计算高效的代理指标;(3)参数与数据的协同选择可产生乘数级效率增益,可能将资源需求降低数个数量级。基于此理论,我们提出一个两阶段范式:面向基础开发者的边际收益预训练与面向下游用户的梯度引导适配,二者通过梯度蓝图(描述不同任务中关键参数的元数据)相衔接。这种单位资源能力视角将曾被视作实用硬件变通方案的方法转化为理论最优策略,在显著降低环境影响的同时,使前沿AI能力更易普及。通过将资源意识嵌入模型开发、适配与评估的全过程,我们能够将AI发展重塑为更可持续、更公平的未来。