Combining diverse foundation models is promising, but weight-merging is limited by mismatched architectures and closed APIs. Trinity addresses this with a lightweight coordinator that orchestrates collaboration among large language models (LLMs). The coordinator, comprising a compact language model (approximately $0.6$B parameters) and a lightweight head (approximately $10$K parameters), is optimized with an evolutionary strategy for efficient and adaptive delegation. Trinity processes queries over multiple turns, where at each turn the coordinator assigns one of three roles (Thinker, Worker, or Verifier) to a selected LLM, effectively offloading complex skill acquisition from the coordinator itself. Experiments show that Trinity consistently outperforms individual models and existing methods across coding, math, reasoning, and domain knowledge tasks, and generalizes robustly to out-of-distribution tasks. On standard benchmarks, Trinity achieves state-of-the-art results, including a score of 86.2% on LiveCodeBench. Theoretical and empirical analyses identify two main factors behind this performance: (1) the coordinator's hidden-state representations provide rich contextualization of inputs, and (2) under high dimensionality and strict budget constraints, the separable Covariance Matrix Adaptation Evolution Strategy offers advantages over reinforcement learning, imitation learning, and random search by exploiting potential block-epsilon-separability.
翻译:融合多样化基础模型前景广阔,但权重合并受限于架构不匹配与封闭API。TRINITY通过一个轻量级协调器来解决此问题,该协调器负责编排多个大语言模型(LLMs)间的协作。该协调器由一个紧凑的语言模型(约$0.6$B参数)和一个轻量级头部(约$10$K参数)构成,并通过进化策略进行优化,以实现高效且自适应的任务委派。TRINITY以多轮次方式处理查询,在每一轮中,协调器为选定的LLM分配三个角色之一(思考者、执行者或验证者),从而有效地将复杂的技能获取任务从协调器本身卸载。实验表明,TRINITY在编码、数学、推理和领域知识任务上持续优于单一模型及现有方法,并能稳健地泛化至分布外任务。在标准基准测试中,TRINITY取得了最先进的结果,包括在LiveCodeBench上获得86.2%的分数。理论与实证分析揭示了其性能背后的两个主要因素:(1)协调器的隐藏状态表示提供了丰富的输入上下文信息;(2)在高维度和严格预算约束下,可分离协方差矩阵自适应进化策略通过利用潜在的块-epsilon-可分离性,相较于强化学习、模仿学习和随机搜索展现出优势。