As large pre-trained language models become increasingly critical to natural language understanding (NLU) tasks, their substantial computational and memory requirements have raised significant economic and environmental concerns. Addressing these challenges, this paper introduces the Elastic Language Model (ELM), a novel neural architecture search (NAS) method optimized for compact language models. ELM extends existing NAS approaches by introducing a flexible search space with efficient transformer blocks and dynamic modules for dimension and head number adjustment. These innovations enhance the efficiency and flexibility of the search process, which facilitates more thorough and effective exploration of model architectures. We also introduce novel knowledge distillation losses that preserve the unique characteristics of each block, in order to improve the discrimination between architectural choices during the search process. Experiments on masked language modeling and causal language modeling tasks demonstrate that models discovered by ELM significantly outperform existing methods.
翻译:随着大规模预训练语言模型在自然语言理解(NLU)任务中日益关键,其巨大的计算与内存需求引发了显著的经济与环境担忧。为应对这些挑战,本文提出了弹性语言模型(ELM),一种专为紧凑语言模型优化的新型神经架构搜索(NAS)方法。ELM通过引入包含高效Transformer模块及动态维度与注意力头数调整模块的灵活搜索空间,扩展了现有NAS方法。这些创新提升了搜索过程的效率与灵活性,从而促进了对模型架构更全面且有效的探索。我们还提出了新颖的知识蒸馏损失函数,以保留每个模块的独特特性,从而在搜索过程中增强对架构选择的区分能力。在掩码语言建模与因果语言建模任务上的实验表明,ELM发现的模型显著优于现有方法。