Casting complex inputs into tractable representations is a critical step across various fields. Diverse embedding models emerge from differences in architectures, loss functions, input modalities and datasets, each capturing unique aspects of the input. Multi-teacher distillation leverages this diversity to enrich representations but often remains tailored to specific tasks. In this paper, we introduce a task-agnostic framework based on a ``majority vote" objective function. We demonstrate that this function is bounded by the mutual information between student and teachers' embeddings, leading to a task-agnostic distillation loss that eliminates dependence on task-specific labels or prior knowledge. Our evaluations across text, vision models, and molecular modeling show that our method effectively leverages teacher diversity, resulting in representations enabling better performance for a wide range of downstream tasks such as classification, clustering, or regression. Additionally, we train and release state-of-the-art embedding models, enhancing downstream performance in various modalities.
翻译:将复杂输入转换为易于处理的表示是各领域的关键步骤。由于架构、损失函数、输入模态和数据集的差异,涌现出多种嵌入模型,每种模型都捕捉输入的不同方面。多教师蒸馏利用这种多样性来丰富表示,但通常仍针对特定任务定制。本文提出一种基于"多数投票"目标函数的任务无关框架。我们证明该函数受学生与教师嵌入间互信息的上界约束,从而推导出任务无关的蒸馏损失,消除了对任务特定标签或先验知识的依赖。我们在文本、视觉模型和分子建模领域的评估表明,该方法能有效利用教师多样性,产生的表示在分类、聚类或回归等多种下游任务中均能实现更优性能。此外,我们训练并发布了最先进的嵌入模型,提升了多模态场景下的下游任务表现。