基于COVID-19的深传学习 (Deep Transfer Learning based COVID-19 Detection in Cough, Breath and Speech using Bottleneck Features)

We present an experimental investigation into the automatic detection of COVID-19 from coughs, breaths and speech as this type of screening is non-contact, does not require specialist medical expertise or laboratory facilities and can easily be deployed on inexpensive consumer hardware. Smartphone recordings of cough, breath and speech from subjects around the globe are used for classification by seven standard machine learning classifiers using leave-$p$-out cross-validation to provide a promising baseline performance. Then, a diverse dataset of 10.29 hours of cough, sneeze, speech and noise audio recordings are used to pre-train a CNN, LSTM and Resnet50 classifier and fine tuned the model to enhance the performance even further. We have also extracted the bottleneck features from these pre-trained models by removing the final-two layers and used them as an input to the LR, SVM, MLP and KNN classifiers to detect COVID-19 signature. The highest AUC of 0.98 was achieved using a transfer learning based Resnet50 architecture on coughs from Coswara dataset. The highest AUC of 0.94 and 0.92 was achieved from an SVM run on the bottleneck features extracted from the breaths from Coswara dataset and speech recordings from ComParE dataset. We conclude that among all vocal audio, coughs carry the strongest COVID-19 signature followed by breath and speech and using transfer learning improves the classifier performance with higher AUC and lower variance across the cross-validation folds. Although these signatures are not perceivable by human ear, machine learning based COVID-19 detection is possible from vocal audio recorded via smartphone.

翻译：我们对从咳嗽、呼吸和言语中自动检测COVID-19进行实验性调查,因为这种类型的筛查是非接触性的,不需要专家医疗专门知识或实验室设施,而且可以很容易地在廉价的消费硬件上部署。全球各主题的咳嗽、呼吸和讲话的智能手机记录被7个标准的机器学习分类器用于分类,使用请假-p$的交叉校验,以提供有希望的基线性能。然后,使用10.29小时的咳嗽、喷雾、语音和噪音录音的数据集来预置CNN、LSTM和Resnet50分类器,并精细调整模型,以进一步提高性能。我们还从这些经过预先训练的模型中提取了卡塞克特的卡特功能,删除了最后的两层,并把它们用作对LRLR、SVM、MP和KNNNG分类的输入,以检测COVI的信号。最高为0.98AUC,使用基于Coswara数据设置的Resnet50结构传输最强的感官学习。从S-94和0.92的AUCScial Scial Scial Scial 学习了Smex的Smex 。从Smev的所有Smmmal 和Smex的Smmals的Smal 。