Automatic objective non-invasive detection of pathological voice based on computerized analysis of acoustic signals can play an important role in early diagnosis, progression tracking and even effective treatment of pathological voices. In search towards such a robust voice pathology detection system we investigated 3 distinct classifiers within supervised learning and anomaly detection paradigms. We conducted a set of experiments using a variety of input data such as raw waveforms, spectrograms, mel-frequency cepstral coefficients (MFCC) and conventional acoustic (dysphonic) features (AF). In comparison with previously published works, this article is the first to utilize combination of 4 different databases comprising normophonic and pathological recordings of sustained phonation of the vowel /a/ unrestricted to a subset of vocal pathologies. Furthermore, to our best knowledge, this article is the first to explore gradient boosted trees and deep learning for this application. The following best classification performances measured by F1 score on dedicated test set were achieved: XGBoost (0.733) using AF and MFCC, DenseNet (0.621) using MFCC, and Isolation Forest (0.610) using AF. Even though these results are of exploratory character, conducted experiments do show promising potential of gradient boosting and deep learning methods to robustly detect voice pathologies.
翻译:根据对声学信号的计算机分析,对病态声音进行自动客观的非侵入性检测,在早期诊断、跟踪跟踪甚至有效治疗病态声音方面可以发挥重要作用。为了寻找这样一个强大的声音病理检测系统,我们在监督的学习和异常检测范式中调查了3个不同的分类器。我们利用各种投入数据,如原始波形、光谱、光谱、光频阴激素系数和常规声学(音频)特征等,进行了一系列实验。与以前出版的作品相比,这篇文章首次利用了4个不同数据库的组合,这些数据库包括调制和病理记录,其中包括对发音病理的常规和病理记录。此外,根据我们的最佳知识,这篇文章是首次利用各种输入数据,如原始波状、光谱、光谱、光频谱系数(MF1)和常规声学(音频)特征(AFFFC和MFCCCC、DenseNet(0.621)和Isolaration森林(0.610)进行探索性记录,但用AFAFD进行有潜力的试探性升级,这些成果,以探测测为推进度。