最佳迷你N循环采样：一种用于可靠高效最佳N采样的上下文质量奖励模型 (Best of mini-N in-loop Sampling: A Contextual Quality Reward Model for Reliable and Efficient Best-of-N Sampling)

Modern preference alignment techniques, such as Best-of-N (BoN) sampling, rely on reward models trained with pairwise comparison data. While effective at learning relative preferences, this paradigm fails to capture a signal of response acceptability, leaving systems vulnerable to selecting the least bad of many unacceptable options. This is particularly problematic for hard prompts, where the risk of such false acceptances increases with the number of samples. In this paper, we address this critical reliability gap by introducing a new data collection and modeling framework. By augmenting preference data with an outside option, inspired by discrete choice models, we train a reward model that can distinguish not just what is better, but what is good enough. We leverage this capability to create an adaptive inference strategy, best of mini-N in-loop, which partitions the generation budget into sequential loops with a calibrated, early-exit condition. Our experiments show that when tuned as an alignment guardrail, it reduces reliability failures by 70%, and when tuned as an inference accelerator, it improves average inference speed by over 22% in IMDB-sentiment setting. We thus provide a principled and flexible framework for practitioners to explicitly manage the trade-off between reliability and computational efficiency.

翻译：现代偏好对齐技术（如最佳N采样）依赖于通过成对比较数据训练的奖励模型。尽管在学习相对偏好方面有效，但该范式未能捕捉响应可接受性的信号，导致系统容易从多个不可接受选项中选择最不差的结果。这对于困难提示尤为严重，因为此类错误接受的风险随样本数量增加而增加。本文通过引入新的数据收集与建模框架来解决这一关键可靠性缺陷。受离散选择模型启发，我们通过添加外部选项来增强偏好数据，从而训练出不仅能区分优劣、还能判断何为足够好的奖励模型。基于此能力，我们构建了一种自适应推理策略——最佳迷你N循环采样，该策略将生成预算划分为多个顺序循环，并配备经过校准的提前退出条件。实验表明，当作为对齐护栏进行调优时，该方法将可靠性失效减少70%；当作为推理加速器调优时，在IMDB情感分析场景中平均推理速度提升超过22%。由此我们为实践者提供了一个原则性且灵活的框架，以显式管理可靠性与计算效率之间的权衡。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日