Detecting synthetic speech is challenging when labeled data are scarce and recording conditions vary. Existing end-to-end deep models often overfit or fail to generalize, and while kernel methods can remain competitive, their performance heavily depends on the chosen kernel. Here, we show that using a quantum kernel in audio deepfake detection reduces falsepositive rates without increasing model size. Quantum feature maps embed data into high-dimensional Hilbert spaces, enabling the use of expressive similarity measures and compact classifiers. Building on this motivation, we compare quantum-kernel SVMs (QSVMs) with classical SVMs using identical mel-spectrogram preprocessing and stratified 5-fold cross-validation across four corpora (ASVspoof 2019 LA, ASVspoof 5 (2024), ADD23, and an In-the-Wild set). QSVMs achieve consistently lower equalerror rates (EER): 0.183 vs. 0.299 on ASVspoof 5 (2024), 0.081 vs. 0.188 on ADD23, 0.346 vs. 0.399 on ASVspoof 2019, and 0.355 vs. 0.413 In-the-Wild. At the EER operating point (where FPR equals FNR), these correspond to absolute false-positiverate reductions of 0.116 (38.8%), 0.107 (56.9%), 0.053 (13.3%), and 0.058 (14.0%), respectively. We also report how consistent the results are across cross-validation folds and margin-based measures of class separation, using identical settings for both models. The only modification is the kernel; the features and SVM remain unchanged, no additional trainable parameters are introduced, and the quantum kernel is computed on a conventional computer.
翻译:在标注数据稀缺且录制条件多变的情况下,检测合成语音具有挑战性。现有的端到端深度模型常常过拟合或泛化能力不足,而核方法虽然能保持竞争力,但其性能严重依赖于所选核函数。本文研究表明,在音频深度伪造检测中使用量子核可以在不增加模型规模的情况下降低误报率。量子特征映射将数据嵌入高维希尔伯特空间,从而能够使用更具表达力的相似性度量和更紧凑的分类器。基于此动机,我们在四个数据集(ASVspoof 2019 LA、ASVspoof 5 (2024)、ADD23 以及一个野外数据集)上,使用相同的梅尔频谱图预处理和分层五折交叉验证,比较了量子核支持向量机与经典支持向量机的性能。量子核支持向量机始终获得更低的等错误率:在 ASVspoof 5 (2024) 上为 0.183 对比 0.299,在 ADD23 上为 0.081 对比 0.188,在 ASVspoof 2019 上为 0.346 对比 0.399,在野外数据集上为 0.355 对比 0.413。在等错误率操作点(此时误报率等于漏报率),这分别对应误报率的绝对降低值为 0.116(38.8%)、0.107(56.9%)、0.053(13.3%)和 0.058(14.0%)。我们还报告了在相同设置下,两种模型在交叉验证折之间结果的一致性以及基于间隔的类别分离度量。唯一的修改是核函数;特征和支持向量机保持不变,未引入额外的可训练参数,且量子核是在传统计算机上计算的。