语言模型是否具有理性？——关于一致性规范与信念修正的案例研究 (Are language models rational? The case of coherence norms and belief revision) - 专知论文

会员服务 ·

0

一致 · 语言模型 · 信念修正 · 机器学习模型 · 学习模型 ·

Are language models rational? The case of coherence norms and belief revision

翻译：语言模型是否具有理性？——关于一致性规范与信念修正的案例研究

Thomas Hofweber,Peter Hase,Elias Stengel-Eskin,Mohit Bansal

from arxiv, substantial expansions of sections 4 and 5, updated references, numerous smaller additions and clarifications

Do norms of rationality apply to machine learning models, in particular language models? In this paper we investigate this question by focusing on a special subset of rational norms: coherence norms. We consider both logical coherence norms as well as coherence norms tied to the strength of belief. To make sense of the latter, we introduce the Minimal Assent Connection (MAC) and propose a new account of credence, which captures the strength of belief in language models. This proposal uniformly assigns strength of belief simply on the basis of model internal next token probabilities. We argue that rational norms tied to coherence do apply to some language models, but not to others. This issue is significant since rationality is closely tied to predicting and explaining behavior, and thus it is connected to considerations about AI safety and alignment, as well as understanding model behavior more generally.

翻译：理性规范是否适用于机器学习模型，特别是语言模型？本文通过聚焦于理性规范的一个特殊子集——一致性规范来探讨这一问题。我们同时考察逻辑一致性规范以及与信念强度相关的一致性规范。为理解后者，我们引入了最小认同关联（MAC）并提出一种新的置信度解释框架，该框架能够捕捉语言模型中的信念强度。该方案基于模型内部的下一个词元概率统一分配信念强度。我们认为，与一致性相关的理性规范适用于某些语言模型，但不适用于其他模型。这一问题具有重要意义，因为理性与行为预测和解释密切相关，因此涉及人工智能安全与对齐的考量，以及对模型行为的更广泛理解。

0

相关内容

大模型如何可解释？新泽西理工学院等最新《大型语言模型可解释性》综述

大模型如何可解释？新泽西理工学院等最新《大型语言模型可解释性》综述

专知会员服务

97+阅读 · 2023年9月11日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【ICCV2021】参数化对比学习

专知会员服务

33+阅读 · 2021年7月27日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日

【ACL2020-密歇根州立大学】语言和视觉推理的跨模态关联

【ACL2020-密歇根州立大学】语言和视觉推理的跨模态关联

专知会员服务

57+阅读 · 2020年5月14日

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

50+阅读 · 2022年6月2日

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

专知

12+阅读 · 2020年6月9日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

专知

18+阅读 · 2018年4月2日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

粗糙回归模型与算法研究

国家自然科学基金

8+阅读 · 2015年12月31日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于非对称群体兴趣相关性并融合情境与群体信任的Web服务推荐研究

国家自然科学基金

1+阅读 · 2015年12月31日

相依重尾随机变量和的渐近性及其在更新风险模型中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

Complementary strengths of the Neyman-Rubin and graphical causal frameworks

Arxiv

0+阅读 · 12月9日

An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems

Arxiv

0+阅读 · 12月4日

Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods

Arxiv

0+阅读 · 11月21日

From Confidence to Collapse in LLM Factual Robustness

Arxiv

0+阅读 · 11月20日

Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs

Arxiv

0+阅读 · 11月13日

VIP会员

文章信息

相关主题

机器学习模型

相关VIP内容

大模型如何可解释？新泽西理工学院等最新《大型语言模型可解释性》综述

大模型如何可解释？新泽西理工学院等最新《大型语言模型可解释性》综述

专知会员服务

97+阅读 · 2023年9月11日

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【ICCV2021】参数化对比学习

专知会员服务

33+阅读 · 2021年7月27日

【NeurIPS2020】无限可能的联合对比学习

专知会员服务

29+阅读 · 2020年10月2日

【ACL2020-密歇根州立大学】语言和视觉推理的跨模态关联

【ACL2020-密歇根州立大学】语言和视觉推理的跨模态关联

专知会员服务

57+阅读 · 2020年5月14日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

50+阅读 · 2022年6月2日

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

【CVPR2020-北京大学】自适应间隔损失的提升小样本学习

专知

12+阅读 · 2020年6月9日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

【推荐系统论文笔记】DKN: 基于深度知识感知的新闻推荐网络（WWW2018 ）

专知

18+阅读 · 2018年4月2日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

Complementary strengths of the Neyman-Rubin and graphical causal frameworks

Arxiv

0+阅读 · 12月9日

An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems

Arxiv

0+阅读 · 12月4日

Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods

Arxiv

0+阅读 · 11月21日

From Confidence to Collapse in LLM Factual Robustness

Arxiv

0+阅读 · 11月20日

Guess or Recall? Training CNNs to Classify and Localize Memorization in LLMs

Arxiv

0+阅读 · 11月13日

相关基金

粗糙回归模型与算法研究

国家自然科学基金

8+阅读 · 2015年12月31日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

基于非对称群体兴趣相关性并融合情境与群体信任的Web服务推荐研究

国家自然科学基金

1+阅读 · 2015年12月31日

相依重尾随机变量和的渐近性及其在更新风险模型中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员