This paper proposes a novel Wavelet Packet based feature extraction approach for the task of text independent speaker recognition. The features are extracted by using the combination of Mel Frequency Cepstral Coefficient (MFCC) and Wavelet Packet Transform (WPT).Hybrid Features technique uses the advantage of human ear simulation offered by MFCC combining it with multi-resolution property and noise robustness of WPT. To check the validity of the proposed approach for the text independent speaker identification and verification we have used the Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) respectively as the classifiers. The proposed paradigm is tested on voxforge speech corpus and CSTR US KED Timit database. The paradigm is also evaluated after adding standard noise signal at different level of SNRs for evaluating the noise robustness. Experimental results show that better results are achieved for the tasks of both speaker identification as well as speaker verification.
翻译:本文提出了一种新颖的基于小波包的特征提取方法,用于文本无关的说话人识别任务。该方法通过结合梅尔频率倒谱系数(MFCC)和小波包变换(WPT)来提取特征。这种混合特征技术利用了MFCC提供的人耳模拟优势,并结合了WPT的多分辨率特性和噪声鲁棒性。为了验证所提方法在文本无关说话人辨认和确认任务中的有效性,我们分别使用高斯混合模型(GMM)和隐马尔可夫模型(HMM)作为分类器。所提出的范式在voxforge语音语料库和CSTR US KED Timit数据库上进行了测试。为了评估其噪声鲁棒性,还在不同信噪比水平下添加标准噪声信号后对该范式进行了评估。实验结果表明,该方法在说话人辨认和确认任务中均取得了更好的结果。