AI 教师测试:衡量教育对话中Blender和GPT-3的教学能力 (The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues) - 专知论文

会员服务 ·

0

会话智能体 · 任务对话系统 · GPT-3 · Uptake · Performer ·

2022 年 5 月 16 日

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

翻译：AI 教师测试:衡量教育对话中Blender和GPT-3的教学能力

Anaïs Tack,Chris Piech

from arxiv, to be published in the Proceedings of the 15th International Conference on Educational Data Mining; 8 pages, 5 figures, 3 tables

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports on a first attempt at an AI teacher test. We built a solution around the insight that you can run conversational agents in parallel to human teachers in real-world dialogues, simulate how different agents would respond to a student, and compare these counterpart responses in terms of three abilities: speak like a teacher, understand a student, help a student. Our method builds on the reliability of comparative judgments in education and uses a probabilistic model and Bayesian sampling to infer estimates of pedagogical ability. We find that, even though conversational agents (Blender in particular) perform well on conversational uptake, they are quantifiably worse than real teachers on several pedagogical dimensions, especially with regard to helpfulness (Blender: {\Delta} ability = -0.75; GPT-3: {\Delta} ability = -0.93).

翻译：我们如何测试最先进的基因模型,如Blender和GPT-3, 能够对学生在教育对话中做出回应的AI教师是否是优秀的AI教师?设计AI教师测试具有挑战性:虽然评估方法非常需要,但衡量教学能力没有现成的解决方案。本文报告了首次尝试AI教师测试的情况。我们围绕在现实世界对话中你可以与人类教师同时运行对话代理人的洞察力建立了一个解决方案,模拟不同代理人如何对学生作出反应,并以三种能力比较这些对应的对应反应:像教师一样说话,理解学生,帮助学生。我们的方法建立在教育比较判断的可靠性的基础上,并且使用概率模型和巴耶斯抽样来推断教学能力的估算。我们发现,尽管谈话代理人(特别是Blender)在对话中表现良好,但在多个教学层面,特别是在帮助性方面,它们比真正的教师还要糟糕,可以量化,特别是(Blender: =-7.75;GPT-3:=0.93) 能力:GPT=0.93。

0

相关内容

会话智能体

会话智能体

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

人类转录因子基因家族调控网络进化模式研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米组装和复合结构MgFe2O4负极材料合成及储锂性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

小麦TaERF4应答植物盐胁迫的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

多因素不确定情况下路面最优养护维修策略决策方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素调控TAK1基因表达在类风湿性关节炎发病中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

柔性Sn/TiC/C纳米纤维膜锂离子电池负极材料的复合结构设计及电化学性能协同增效机制

国家自然科学基金

0+阅读 · 2012年12月31日

人源PCL家族蛋白参与表观遗传调控的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

MFC风扇驱动仿昆虫柔性扑翼机理与设计方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

On the Efficiency of Subclass Knowledge Distillation in Classification Tasks

Arxiv

0+阅读 · 2022年7月5日

Bayesian model selection for multilevel models using marginal likelihoods

Arxiv

0+阅读 · 2022年7月5日

Language-specific Characteristic Assistance for Code-switching Speech Recognition

Arxiv

0+阅读 · 2022年7月5日

Student-AI Creative Writing: Pedagogical Strategies for Applying Natural Language Generation in Schools

Arxiv

0+阅读 · 2022年7月5日

Firenze: Model Evaluation Using Weak Signals

Arxiv

0+阅读 · 2022年7月2日

Training Novices: The Role of Human-AI Collaboration and Knowledge Transfer

Training Novices: The Role of Human-AI Collaboration and Knowledge Transfer

Arxiv

0+阅读 · 2022年7月1日

Is it possible not to cheat on the Turing Test: Exploring the potential and challenges for true natural language 'understanding' by computers

Is it possible not to cheat on the Turing Test: Exploring the potential and challenges for true natural language 'understanding' by computers

Arxiv

0+阅读 · 2022年7月1日

ExSum: From Local Explanations to Model Understanding

Arxiv

13+阅读 · 2022年4月30日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

VIP会员

文章信息

相关主题

会话智能体

任务对话系统

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《海战法：海战中的人工智能与自主系统》最新45页

《美军条令：行动后评估》2025最新36页

中文版 | 先进通信技术

《国防系统提升可靠性与维护性评估效能的实践准则》最新64页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

On the Efficiency of Subclass Knowledge Distillation in Classification Tasks

Arxiv

0+阅读 · 2022年7月5日

Bayesian model selection for multilevel models using marginal likelihoods

Arxiv

0+阅读 · 2022年7月5日

Language-specific Characteristic Assistance for Code-switching Speech Recognition

Arxiv

0+阅读 · 2022年7月5日

Student-AI Creative Writing: Pedagogical Strategies for Applying Natural Language Generation in Schools

Arxiv

0+阅读 · 2022年7月5日

Firenze: Model Evaluation Using Weak Signals

Arxiv

0+阅读 · 2022年7月2日

Training Novices: The Role of Human-AI Collaboration and Knowledge Transfer

Training Novices: The Role of Human-AI Collaboration and Knowledge Transfer

Arxiv

0+阅读 · 2022年7月1日

Is it possible not to cheat on the Turing Test: Exploring the potential and challenges for true natural language 'understanding' by computers

Is it possible not to cheat on the Turing Test: Exploring the potential and challenges for true natural language 'understanding' by computers

Arxiv

0+阅读 · 2022年7月1日

ExSum: From Local Explanations to Model Understanding

Arxiv

13+阅读 · 2022年4月30日

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

Arxiv

11+阅读 · 2019年11月4日

Multimodal Sentiment Analysis To Explore the Structure of Emotions

Arxiv

19+阅读 · 2018年5月25日

相关基金

人类转录因子基因家族调控网络进化模式研究

国家自然科学基金

0+阅读 · 2015年12月31日

纳米组装和复合结构MgFe2O4负极材料合成及储锂性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

小麦TaERF4应答植物盐胁迫的作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

多因素不确定情况下路面最优养护维修策略决策方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

Calderon问题和边界刚性问题

国家自然科学基金

0+阅读 · 2013年12月31日

雌激素调控TAK1基因表达在类风湿性关节炎发病中的作用和机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

柔性Sn/TiC/C纳米纤维膜锂离子电池负极材料的复合结构设计及电化学性能协同增效机制

国家自然科学基金

0+阅读 · 2012年12月31日

人源PCL家族蛋白参与表观遗传调控的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

MFC风扇驱动仿昆虫柔性扑翼机理与设计方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员