评估自然语言处理嵌入模型处理学生文本中科学特定符号表达的能力 (Evaluating NLP Embedding Models for Handling Science-Specific Symbolic Expressions in Student Texts)

In recent years, natural language processing (NLP) has become integral to educational data mining, particularly in the analysis of student-generated language products. For research and assessment purposes, so-called embedding models are typically employed to generate numeric representations of text that capture its semantic content for use in subsequent quantitative analyses. Yet when it comes to science-related language, symbolic expressions such as equations and formulas introduce challenges that current embedding models struggle to address. Existing research studies and practical applications often either overlook these challenges or remove symbolic expressions altogether, potentially leading to biased research findings and diminished performance of practical applications. This study therefore explores how contemporary embedding models differ in their capability to process and interpret science-related symbolic expressions. To this end, various embedding models are evaluated using physics-specific symbolic expressions drawn from authentic student responses, with performance assessed via two approaches: 1) similarity-based analyses and 2) integration into a machine learning pipeline. Our findings reveal significant differences in model performance, with OpenAI's GPT-text-embedding-3-large outperforming all other examined models, though its advantage over other models was moderate rather than decisive. Overall, this study underscores the importance for educational data mining researchers and practitioners of carefully selecting NLP embedding models when working with science-related language products that include symbolic expressions. The code and (partial) data are available at https://doi.org/10.17605/OSF.IO/6XQVG.

翻译：近年来，自然语言处理（NLP）已成为教育数据挖掘不可或缺的一部分，特别是在分析学生生成的语言产品方面。出于研究和评估目的，通常采用所谓的嵌入模型来生成文本的数值表示，以捕捉其语义内容，用于后续的定量分析。然而，当涉及科学相关语言时，诸如方程和公式之类的符号表达式带来了当前嵌入模型难以应对的挑战。现有的研究和实际应用往往忽视这些挑战，或完全移除符号表达式，这可能导致有偏见的研究结果和实际应用性能下降。因此，本研究探讨了当代嵌入模型在处理和解释科学相关符号表达式方面的能力差异。为此，我们使用从真实学生回答中提取的物理学特定符号表达式评估了多种嵌入模型，并通过两种方法评估其性能：1）基于相似性的分析，以及2）集成到机器学习流程中。我们的研究结果显示，模型性能存在显著差异，其中OpenAI的GPT-text-embedding-3-large模型优于所有其他被检模型，尽管其相对于其他模型的优势是适度的而非决定性的。总体而言，本研究强调了教育数据挖掘研究者和实践者在处理包含符号表达式的科学相关语言产品时，仔细选择NLP嵌入模型的重要性。代码和（部分）数据可在 https://doi.org/10.17605/OSF.IO/6XQVG 获取。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日