Compositional generalization-a key open challenge in modern machine learning-requires models to predict unknown combinations of known concepts. However, assessing compositional generalization remains a fundamental challenge due to the lack of standardized evaluation protocols and the limitations of current benchmarks, which often favor efficiency over rigor. At the same time, general-purpose vision architectures lack the necessary inductive biases, and existing approaches to endow them compromise scalability. As a remedy, this paper introduces: 1) a rigorous evaluation framework that unifies and extends previous approaches while reducing computational requirements from combinatorial to constant; 2) an extensive and modern evaluation on the status of compositional generalization in supervised vision backbones, training more than 5000 models; 3) Attribute Invariant Networks, a class of models establishing a new Pareto frontier in compositional generalization, achieving a 23.43% accuracy improvement over baselines while reducing parameter overhead from 600% to 16% compared to fully disentangled counterparts.
翻译:组合泛化——现代机器学习中的一个关键开放挑战——要求模型能够预测已知概念的未知组合。然而,由于缺乏标准化的评估协议以及当前基准测试的局限性(这些基准通常偏向效率而非严谨性),评估组合泛化仍然是一个根本性挑战。与此同时,通用视觉架构缺乏必要的归纳偏置,而现有赋予它们这种偏置的方法又牺牲了可扩展性。为解决这些问题,本文提出:1)一个严谨的评估框架,该框架统一并扩展了先前方法,同时将计算需求从组合级降低至常数级;2)对监督式视觉主干网络中的组合泛化现状进行了广泛且现代的评估,训练了超过5000个模型;3)属性不变网络,这是一类在组合泛化中建立新帕累托前沿的模型,相比基线实现了23.43%的准确率提升,同时与完全解耦的对应模型相比,将参数开销从600%降低至16%。