This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified K-fold cross-validation, to address class imbalance in the PROMISE dataset. This dataset comprises 969 categorized requirements, classified into functional and non-functional types. The proposed approach enhances the representation of minority classes while maintaining the integrity of validation folds, leading to a notable improvement in classification accuracy. Logistic regression achieved 76.16\%, significantly surpassing the baseline of 58.31\%. These results highlight the applicability and efficiency of machine learning models as scalable and interpretable solutions.
翻译:本研究聚焦于需求工程领域,通过应用SMOTE-Tomek预处理技术并结合分层K折交叉验证,以解决PROMISE数据集中存在的类别不平衡问题。该数据集包含969条已分类需求,分为功能需求与非功能需求两类。所提出的方法在保持验证折完整性的同时增强了少数类别的表征能力,从而显著提升了分类准确率。逻辑回归模型取得了76.16%的准确率,显著超越58.31%的基线水平。这些结果凸显了机器学习模型作为可扩展且可解释解决方案的适用性与有效性。