GPT-3.5和GPT-4模型在巴西大学入学考试中的评估 (Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams)

The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities. This exam poses challenging tasks for LMs, since its questions may span into multiple fields of knowledge, requiring understanding of information from diverse domains. For instance, a question may require comprehension of both statistics and biology to be solved. This work analyzed responses generated by GPT-3.5 and GPT-4 models for questions presented in the 2009-2017 exams, as well as for questions of the 2022 exam, which were made public after the training of the models was completed. Furthermore, different prompt strategies were tested, including the use of Chain-of-Thought (CoT) prompts to generate explanations for answers. On the 2022 edition, the best-performing model, GPT-4 with CoT, achieved an accuracy of 87%, largely surpassing GPT-3.5 by 11 points. The code and data used on experiments are available at https://github.com/piresramon/gpt-4-enem.

翻译：本研究旨在探讨语言模型（LMs）在处理高风险多项选择测试中的能力，以巴西大学广泛采用的多学科入学考试Exame Nacional do Ensino Médio（ENEM）为例。这项考试对LM提出了挑战性的任务，因为其问题可能涉及多个知识领域，需要理解来自不同领域的信息。例如，一个问题可能需要理解统计学和生物学的内容才能解决。本研究分析了GPT-3.5和GPT-4模型对于2009-2017年考试中提出的问题生成的响应，以及对于2022年考试中的问题生成的响应，这些问题在模型训练完成后变为公开的试题。此外，还测试了不同的提示策略，包括使用Chain-of-Thought（CoT）提示生成答案的解释。在2022年的测试中，表现最佳的GPT-4与CoT的模型准确度达到87％，比GPT-3.5高出11个百分点。实验使用的代码和数据可在https://github.com/piresramon/gpt-4-enem获取。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【USC-Aaron Chan博士答辩Slides】可信自然语言处理机器解释的生成与利用, 242页ppt，Generating and Utilizing Machine Explanations for Trustworthy NLP

专知会员服务

16+阅读 · 2022年3月13日