Recent advances in reasoning techniques have substantially improved the performance of large language models (LLMs), raising expectations for their ability to provide accurate, truthful, and reliable information. However, emerging evidence suggests that iterative reasoning may foster belief entrenchment and confirmation bias, rather than enhancing truth-seeking behavior. In this study, we propose a systematic evaluation framework for belief entrenchment in LLM reasoning by leveraging the Martingale property from Bayesian statistics. This property implies that, under rational belief updating, the expected value of future beliefs should remain equal to the current belief, i.e., belief updates are unpredictable from the current belief. We propose the unsupervised, regression-based Martingale Score to measure violations of this property, which signal deviation from the Bayesian ability of updating on new evidence. In open-ended problem domains including event forecasting, value-laden questions, and academic paper review, we find such violations to be widespread across models and setups, where the current belief positively predicts future belief updates, a phenomenon which we term belief entrenchment. We identify the models, reasoning techniques, and domains more prone to belief entrenchment. Finally, we validate the Martingale Score by showing that it predicts ground-truth accuracy on problem domains where ground truth labels are available. This indicates that, while designed as an unsupervised metric that operates even in domains without access to ground truth, the Martingale Score is a useful proxy of the truth-seeking ability of a reasoning process.
翻译:推理技术的近期进展显著提升了大型语言模型(LLM)的性能,增强了人们对其提供准确、真实且可靠信息的期望。然而,新出现的证据表明,迭代推理可能助长信念固化和确认偏误,而非促进求真行为。本研究基于贝叶斯统计中的鞅性质,提出了一个用于系统评估LLM推理中信念固化的框架。该性质意味着,在理性信念更新的前提下,未来信念的期望值应保持与当前信念相等,即信念更新无法从当前信念预测。我们提出了一种基于回归的无监督度量——鞅分数,用于衡量对该性质的违反程度,这种违反信号标志着模型在依据新证据进行更新的贝叶斯能力上存在偏差。在包括事件预测、价值负载问题及学术论文评审在内的开放式问题领域中,我们发现此类违反现象在各类模型与设置中普遍存在,其中当前信念正向预测未来的信念更新,我们将此现象称为信念固化。我们识别了更易出现信念固化的模型、推理技术及问题领域。最后,我们通过证明鞅分数在可获得真实标签的问题领域中能够预测真实准确率,从而验证了其有效性。这表明,尽管鞅分数被设计为一种即使在无法获取真实标签的领域也能运作的无监督度量,它仍是衡量推理过程求真能力的一个有效代理指标。