AfterLearnER (After Learning Evolutionary Retrofitting) consists in applying evolutionary optimization to refine fully trained machine learning models by optimizing a set of carefully chosen parameters or hyperparameters of the model, with respect to some actual, exact, and hence possibly non-differentiable error signal, performed on a subset of the standard validation set. The efficiency of AfterLearnER is demonstrated by tackling non-differentiable signals such as threshold-based criteria in depth sensing, the word error rate in speech re-synthesis, the number of kills per life at Doom, computational accuracy or BLEU in code translation, image quality in 3D generative adversarial networks (GANs), and user feedback in image generation via Latent Diffusion Models (LDM). This retrofitting can be done after training, or dynamically at inference time by taking into account the user feedback. The advantages of AfterLearnER are its versatility, the possibility to use non-differentiable feedback, including human evaluations (i.e., no gradient is needed), the limited overfitting supported by a theoretical study, and its anytime behavior. Last but not least, AfterLearnER requires only a small amount of feedback, i.e., a few dozen to a few hundred scalars, compared to the tens of thousands needed in most related published works.
翻译:AfterLearnER(训练后进化式后优化)通过应用进化优化来精调已完全训练的机器学习模型,其方法为:基于标准验证集的一个子集,针对某些实际、精确且可能不可微分的误差信号,优化模型的一组精心选择的参数或超参数。AfterLearnER 的有效性通过处理多种不可微分信号得以验证,包括深度感知中的基于阈值的准则、语音重合成中的词错误率、Doom 游戏中的每局击杀数、代码翻译中的计算准确率或 BLEU 分数、3D 生成对抗网络(GAN)中的图像质量,以及通过潜在扩散模型(LDM)进行图像生成时的用户反馈。这种后优化可在训练完成后进行,或在推理时动态执行,同时考虑用户反馈。AfterLearnER 的优势在于其通用性、可利用不可微分反馈(包括人类评估,即无需梯度)、理论研究所支持的有局限的过拟合现象,以及其随时可用的特性。最后但同样重要的是,与大多数已发表相关工作中所需的数万条反馈相比,AfterLearnER 仅需少量反馈,即几十到几百个标量值。