While large language models (LLMs) have demonstrated remarkable versatility across a wide range of general tasks, their effectiveness often diminishes in domain-specific applications due to inherent knowledge gaps. Moreover, their performance typically declines when addressing complex problems that require multi-step reasoning and analysis. In response to these challenges, we propose leveraging both LLMs and AI agents to develop education assistants aimed at enhancing undergraduate learning in biomechanics courses that focus on analyzing the force and moment in the musculoskeletal system of the human body. To achieve our goal, we construct a dual-module framework to enhance LLM performance in biomechanics educational tasks: 1) we apply Retrieval-Augmented Generation (RAG) to improve the specificity and logical consistency of LLM's responses to the conceptual true/false questions; 2) we build a Multi-Agent System (MAS) to solve calculation-oriented problems involving multi-step reasoning and code execution. Specifically, we evaluate the performance of several LLMs, i.e., Qwen-1.0-32B, Qwen-2.5-32B, and Llama-70B, on a biomechanics dataset comprising 100 true/false conceptual questions and problems requiring equation derivation and calculation. Our results demonstrate that RAG significantly enhances the performance and stability of LLMs in answering conceptual questions, surpassing those of vanilla models. On the other hand, the MAS constructed using multiple LLMs demonstrates its ability to perform multi-step reasoning, derive equations, execute code, and generate explainable solutions for tasks that require calculation. These findings demonstrate the potential of applying RAG and MAS to enhance LLM performance for specialized courses in engineering curricula, providing a promising direction for developing intelligent tutoring in engineering education.
翻译:尽管大语言模型(LLMs)在广泛的通用任务中展现出卓越的适应性,但其在特定领域应用中的有效性常因固有的知识缺口而减弱。此外,当处理需要多步推理与分析的复杂问题时,其性能通常也会下降。针对这些挑战,我们提出结合LLMs与人工智能智能体来开发教育助手,旨在提升本科生在专注于分析人体肌肉骨骼系统力与力矩的生物力学课程中的学习效果。为实现这一目标,我们构建了一个双模块框架以增强LLM在生物力学教育任务中的表现:1)应用检索增强生成(RAG)来提高LLM对概念性判断题回答的针对性与逻辑一致性;2)构建一个多智能体系统(MAS)来解决涉及多步推理与代码执行的计算导向问题。具体而言,我们在一个包含100道概念性判断题以及需要方程推导与计算问题的生物力学数据集上,评估了多个LLM(即Qwen-1.0-32B、Qwen-2.5-32B和Llama-70B)的性能。我们的结果表明,RAG显著提升了LLM在回答概念性问题时的性能与稳定性,超越了原始模型。另一方面,使用多个LLM构建的MAS展示了其执行多步推理、推导方程、执行代码以及为需要计算的任务生成可解释解决方案的能力。这些发现证明了应用RAG与MAS来增强LLM在工程课程专业科目中表现的潜力,为开发工程教育中的智能辅导系统提供了有前景的方向。