Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology. We formalize this bias theoretically, verify it on preference datasets empirically, and show that it plays a central role in mode collapse. Motivated by this analysis, we introduce Verbalized Sampling, a simple, training-free prompting strategy to circumvent mode collapse. VS prompts the model to verbalize a probability distribution over a set of responses (e.g., "Generate 5 jokes about coffee and their corresponding probabilities"). Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety. For instance, in creative writing, VS increases diversity by 1.6-2.1x over direct prompting. We further observe an emergent trend that more capable models benefit more from VS. In sum, our work provides a new data-centric perspective on mode collapse and a practical inference-time remedy that helps unlock pre-trained generative diversity.
翻译:后训练对齐通常会降低大语言模型的多样性,导致一种被称为模式崩溃的现象。与先前研究将此效应归因于算法局限性不同,我们识别出一个根本性的、普遍存在的数据层面驱动因素:偏好数据中的典型性偏差。这种偏差源于认知心理学中已确立的发现,即标注者会系统性地偏爱熟悉的文本。我们从理论上形式化了这种偏差,在偏好数据集上进行了实证验证,并证明了它在模式崩溃中起着核心作用。基于此分析,我们提出了语言化采样,这是一种简单、无需训练的提示策略,用以规避模式崩溃。VS 提示模型对一组响应进行概率分布的语言化描述(例如,“生成5个关于咖啡的笑话及其对应的概率”)。综合实验表明,VS 在创意写作(诗歌、故事、笑话)、对话模拟、开放式问答和合成数据生成方面显著提升了性能,且未牺牲事实准确性和安全性。例如,在创意写作中,VS 将多样性较直接提示提高了1.6-2.1倍。我们进一步观察到一个新兴趋势:能力更强的模型从 VS 中获益更多。总之,我们的工作为模式崩溃提供了一个新的以数据为中心的视角,并提出了一种实用的推理时补救方法,有助于释放预训练生成模型的多样性。