Legal interpretation frequently involves assessing how a legal text, as understood by an 'ordinary' speaker of the language, applies to the set of facts characterizing a legal dispute in the U.S. judicial system. Recent scholarship has proposed that legal practitioners add large language models (LLMs) to their interpretive toolkit. This work offers an empirical argument against LLM interpretation as recently practiced by legal scholars and federal judges. Our investigation in English shows that models do not provide stable interpretive judgments: varying the question format can lead the model to wildly different conclusions. Moreover, the models show weak to moderate correlation with human judgment, with large variance across model and question variant, suggesting that it is dangerous to give much credence to the conclusions produced by generative AI.
翻译:法律解释通常涉及评估法律文本(以该语言‘普通’使用者的理解为基础)如何适用于美国司法体系中法律争议所涉及的事实集合。近期学术研究提出,法律从业者可将大型语言模型(LLMs)纳入其解释工具箱。本研究通过实证分析,对当前法律学者与联邦法官实践中采用的LLM解释方法提出反驳。我们在英语语境下的研究表明,模型无法提供稳定的解释性判断:改变提问形式可导致模型得出截然不同的结论。此外,模型与人类判断之间仅存在弱至中等程度的相关性,且该相关性在不同模型及问题变体间波动显著,这表明对生成式人工智能得出的结论给予过度信任存在风险。