大语言模型能解释医学问题吗? (Can large language models reason about medical questions?) - 专知论文

会员服务 ·

0

语言模型化 · CoT · 知识 (knowledge) · Performer · MoDELS ·

2023 年 1 月 24 日

Can large language models reason about medical questions?

翻译：大语言模型能解释医学问题吗?

Valentin Liévin,Christoffer Egeberg Hother,Ole Winther

from arxiv, 33 pages, 6 figures, to be submitted. v1: results using InstructGPT, v2: added the Codex experiments, v3: added the missing test MedMCQA results for Codex 5-shot CoT and using k=100 samples

Although large language models (LLMs) often produce impressive outputs, it remains unclear how they perform in real-world scenarios requiring strong reasoning skills and expert domain knowledge. We set out to investigate whether GPT-3.5 (Codex and InstructGPT) can be applied to answer and reason about difficult real-world-based questions. We utilize two multiple-choice medical exam questions (USMLE and MedMCQA) and a medical reading comprehension dataset (PubMedQA). We investigate multiple prompting scenarios: Chain-of-Thought (CoT, think step-by-step), zero- and few-shot (prepending the question with question-answer exemplars) and retrieval augmentation (injecting Wikipedia passages into the prompt). For a subset of the USMLE questions, a medical expert reviewed and annotated the model's CoT. We found that InstructGPT can often read, reason and recall expert knowledge. Failure are primarily due to lack of knowledge and reasoning errors and trivial guessing heuristics are observed, e.g.\ too often predicting labels A and D on USMLE. Sampling and combining many completions overcome some of these limitations. Using 100 samples, Codex 5-shot CoT not only gives close to well-calibrated predictive probability but also achieves human-level performances on the three datasets. USMLE: 60.2%, MedMCQA: 62.7% and PubMedQA: 78.2%.

翻译：虽然大型语言模型(LLMS)往往产生令人印象深刻的产出,但人们仍然不清楚它们在现实世界情景中如何运作,需要很强的推理技巧和专家领域知识。我们准备调查GPT-3.5(Codex和教官GPT)是否可以用于回答和解释基于现实的难题。我们使用两个多重选择的医学考试问题(USMLE和MedMCQA)和医学阅读理解数据集(PubMedQA)。我们调查了多种提示性情景:链路(CoT, 逐步思考),零和少见(先用问答表解器)和检索增强(将维基百科百分解的段落输入到快速。对于美国MLE问题的一部分,一位医学专家经过审查和附加了模型的 CoT。我们发现,GPT常常可以阅读、解释和回忆专家知识。我们调查失败的主要原因是缺乏知识和推理错误,人们也只是粗略地猜测,例如,太经常预测USMLE的A和DPIML2的等级:Sambal-Cs 3的精确的精确的精确性能度,也使这些模型得以克服了。

0

相关内容

语言模型化

语言模型化

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

125+阅读 · 2023年1月29日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

41+阅读 · 2022年6月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

123+阅读 · 2020年7月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

56+阅读 · 2020年3月14日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

微纳尺度多孔介质中气体运移机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀土金属团簇的全局最优结构预测及电子结构分析

国家自然科学基金

0+阅读 · 2013年12月31日

BRCA1蛋白出核的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

线粒体自噬在2型糖尿病发病机制中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

斑马鱼β-catenin核转运的调控机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

SARI基因在肺癌侵袭转移中的作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

ART: Automatic multi-step reasoning and tool-use for large language models

Arxiv

1+阅读 · 2023年3月16日

Can Very Large Pretrained Language Models Learn Storytelling With A Few Examples?

Arxiv

0+阅读 · 2023年3月15日

A Large and Diverse Arabic Corpus for Language Modeling

Arxiv

0+阅读 · 2023年3月15日

Chat with the Environment: Interactive Multimodal Perception using Large Language Models

Arxiv

0+阅读 · 2023年3月14日

Task and Motion Planning with Large Language Models for Object Rearrangement

Arxiv

0+阅读 · 2023年3月14日

Towards Reasoning in Large Language Models: A Survey

Arxiv

28+阅读 · 2022年12月20日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

VIP会员

文章信息

相关主题

语言模型化

知识 (knowledge)

相关VIP内容

【2023新书】使用Python进行统计和数据可视化，554页pdf

【2023新书】使用Python进行统计和数据可视化，554页pdf

专知会员服务

125+阅读 · 2023年1月29日

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

41+阅读 · 2022年6月30日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

123+阅读 · 2020年7月18日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

56+阅读 · 2020年3月14日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

53+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

90+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

77+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

64+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

25+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

相关论文

ART: Automatic multi-step reasoning and tool-use for large language models

Arxiv

1+阅读 · 2023年3月16日

Can Very Large Pretrained Language Models Learn Storytelling With A Few Examples?

Arxiv

0+阅读 · 2023年3月15日

A Large and Diverse Arabic Corpus for Language Modeling

Arxiv

0+阅读 · 2023年3月15日

Chat with the Environment: Interactive Multimodal Perception using Large Language Models

Arxiv

0+阅读 · 2023年3月14日

Task and Motion Planning with Large Language Models for Object Rearrangement

Arxiv

0+阅读 · 2023年3月14日

Towards Reasoning in Large Language Models: A Survey

Arxiv

28+阅读 · 2022年12月20日

A Survey of Knowledge Enhanced Pre-trained Models

Arxiv

28+阅读 · 2021年10月1日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

Arxiv

20+阅读 · 2021年5月27日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

相关基金

微纳尺度多孔介质中气体运移机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

Partial Spread Bent函数与Bent-Negabent函数的构造及密码学性质研究

国家自然科学基金

0+阅读 · 2013年12月31日

稀土金属团簇的全局最优结构预测及电子结构分析

国家自然科学基金

0+阅读 · 2013年12月31日

BRCA1蛋白出核的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

线粒体自噬在2型糖尿病发病机制中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

斑马鱼β-catenin核转运的调控机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

SARI基因在肺癌侵袭转移中的作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员