基于大语言模型的高等教育课程评估探索 (An Exploration of Higher Education Course Evaluation by Large Language Models)

Course evaluation plays a critical role in ensuring instructional quality and guiding curriculum development in higher education. However, traditional evaluation methods, such as student surveys, classroom observations, and expert reviews, are often constrained by subjectivity, high labor costs, and limited scalability. With recent advancements in large language models (LLMs), new opportunities have emerged for generating consistent, fine-grained, and scalable course evaluations. This study investigates the use of three representative LLMs for automated course evaluation at both the micro level (classroom discussion analysis) and the macro level (holistic course review). Using classroom interaction transcripts and a dataset of 100 courses from a major institution in China, we demonstrate that LLMs can extract key pedagogical features and generate structured evaluation results aligned with expert judgement. A fine-tuned version of Llama shows superior reliability, producing score distributions with greater differentiation and stronger correlation with human evaluators than its counterparts. The results highlight three major findings: (1) LLMs can reliably perform systematic and interpretable course evaluations at both the micro and macro levels; (2) fine-tuning and prompt engineering significantly enhance evaluation accuracy and consistency; and (3) LLM-generated feedback provides actionable insights for teaching improvement. These findings illustrate the promise of LLM-based evaluation as a practical tool for supporting quality assurance and educational decision-making in large-scale higher education settings.

翻译：课程评估在保障高等教育教学质量与指导课程建设方面发挥着关键作用。然而，传统的评估方法，如学生问卷调查、课堂观察和专家评审，常受限于主观性、高昂的人力成本以及有限的可扩展性。随着大语言模型（LLMs）的最新进展，为生成一致、细粒度且可扩展的课程评估带来了新的机遇。本研究探讨了使用三种代表性LLMs在微观层面（课堂讨论分析）和宏观层面（整体课程评价）进行自动化课程评估的应用。利用课堂互动转录文本和来自中国一所主要机构的100门课程数据集，我们证明LLMs能够提取关键教学特征，并生成与专家判断一致的结构化评估结果。经过微调的Llama版本展现出更优的可靠性，其生成的分数分布具有更高的区分度，且与人类评估者的相关性更强。结果突出了三项主要发现：（1）LLMs能够在微观和宏观层面可靠地进行系统化且可解释的课程评估；（2）微调与提示工程显著提升了评估的准确性与一致性；（3）LLM生成的反馈为教学改进提供了可操作的见解。这些发现表明，基于LLM的评估作为一种实用工具，在大规模高等教育环境中支持质量保障与教育决策具有广阔前景。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

【NeurIPS2023】CQM: 与量化世界模型的课程强化学习

专知会员服务

25+阅读 · 2023年10月29日

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

37+阅读 · 2022年3月25日

曼彻斯特大学、Mila等 | 生物医学领域的预训练语言模型：系统综述

专知会员服务

20+阅读 · 2021年10月18日