Identifying the factors that influence student performance in basic education is a central challenge for formulating effective public policies in Brazil. This study introduces a multi-level machine learning approach to classify the proficiency of 9th-grade and high school students using microdata from the System of Assessment of Basic Education (SAEB). Our model uniquely integrates four data sources: student socioeconomic characteristics, teacher professional profiles, school indicators, and principal management profiles. A comparative analysis of four ensemble algorithms confirmed the superiority of a Random Forest model, which achieved 90.2% accuracy and an Area Under the Curve (AUC) of 96.7%. To move beyond prediction, we applied Explainable AI (XAI) using SHAP, which revealed that the school's average socioeconomic level is the most dominant predictor, demonstrating that systemic factors have a greater impact than individual characteristics in isolation. The primary conclusion is that academic performance is a systemic phenomenon deeply tied to the school's ecosystem. This study provides a data-driven, interpretable tool to inform policies aimed at promoting educational equity by addressing disparities between schools.
翻译:识别影响基础教育阶段学生表现的因素,是巴西制定有效公共政策的核心挑战。本研究提出一种多层次机器学习方法,利用巴西基础教育评估系统(SAEB)的微观数据,对九年级和高中学生的学业水平进行分类。我们的模型独特地整合了四个数据源:学生社会经济特征、教师专业背景、学校指标以及校长管理档案。通过对四种集成算法的比较分析,证实了随机森林模型的优越性,其准确率达到90.2%,曲线下面积(AUC)为96.7%。为超越单纯预测,我们应用可解释人工智能(XAI)中的SHAP方法进行分析,结果显示学校的平均社会经济水平是最具主导性的预测因子,表明系统性因素比孤立的个体特征具有更大影响。主要结论是:学业表现是一种与学校生态系统深度关联的系统性现象。本研究提供了一种数据驱动、可解释的工具,旨在通过解决学校间差异,为促进教育公平的政策制定提供依据。