基于小波包的多分辨率特征提取说话人识别方法 (Speaker Recognition -- Wavelet Packet Based Multiresolution Feature Extraction Approach)

from arxiv, This paper was originally written in Summer 2013 and previously made available on Figshare. The present submission is uploaded for archival and citation purposes

This paper proposes a novel Wavelet Packet based feature extraction approach for the task of text independent speaker recognition. The features are extracted by using the combination of Mel Frequency Cepstral Coefficient (MFCC) and Wavelet Packet Transform (WPT).Hybrid Features technique uses the advantage of human ear simulation offered by MFCC combining it with multi-resolution property and noise robustness of WPT. To check the validity of the proposed approach for the text independent speaker identification and verification we have used the Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) respectively as the classifiers. The proposed paradigm is tested on voxforge speech corpus and CSTR US KED Timit database. The paradigm is also evaluated after adding standard noise signal at different level of SNRs for evaluating the noise robustness. Experimental results show that better results are achieved for the tasks of both speaker identification as well as speaker verification.

翻译：本文提出了一种新颖的基于小波包的特征提取方法，用于文本无关的说话人识别任务。该方法通过结合梅尔频率倒谱系数与小波包变换来提取特征。这种混合特征技术利用了MFCC提供的人耳模拟优势，并结合了WPT的多分辨率特性和噪声鲁棒性。为验证所提方法在文本无关说话人辨识与验证任务中的有效性，我们分别采用高斯混合模型和隐马尔可夫模型作为分类器。所提方法在VoxForge语音语料库和CSTR US KED Timit数据库上进行了测试，并通过在不同信噪比水平下添加标准噪声信号来评估其噪声鲁棒性。实验结果表明，该方法在说话人辨识和说话人验证任务中均取得了更优的结果。