Herein, we have compared the performance of SVM and MLP in emotion recognition using speech and song channels of the RAVDESS dataset. We have undertaken a journey to extract various audio features, identify optimal scaling strategy and hyperparameter for our models. To increase sample size, we have performed audio data augmentation and addressed data imbalance using SMOTE. Our data indicate that optimised SVM outperforms MLP with an accuracy of 82 compared to 75%. Following data augmentation, the performance of both algorithms was identical at ~79%, however, overfitting was evident for the SVM. Our final exploration indicated that the performance of both SVM and MLP were similar in which both resulted in lower accuracy for the speech channel compared to the song channel. Our findings suggest that both SVM and MLP are powerful classifiers for emotion recognition in a vocal-dependent manner.
翻译:在此,我们比较了SVM和MLP在使用RAVDESS数据集的语音和歌声频道进行情感识别方面的表现。我们进行了一次征集各种音频特征的旅程,确定了我们模型的最佳缩放策略和超参数。为了提高样本规模,我们进行了音频数据扩增,并用SMOTE解决了数据不平衡问题。我们的数据表明,SMOTE的精确度优于MLP的精确度为82比75%。但是,在数据增强之后,两种算法的性能在~79%上是相同的。我们最后的探索表明,SVM和MLP的性能都相似,与歌声频道相比,使语音频道的准确性较低。我们的调查结果表明,SVM和MLP都是以依赖声音的方式识别情绪的强大分类器。