Interpretability is important for many applications of machine learning to signal data, covering aspects such as how well a model fits the data, how accurately explanations are drawn from it, and how well these can be understood by people. Feature extraction and selection can improve model interpretability by identifying structures in the data that are both informative and intuitively meaningful. To this end, we propose a signal classification framework that combines feature extraction with feature selection using the knockoff filter, a method which provides guarantees on the false discovery rate (FDR) amongst selected features. We apply this to a dataset of Raman spectroscopy measurements from bacterial samples. Using a wavelet-based feature representation of the data and a logistic regression classifier, our framework achieves significantly higher predictive accuracy compared to using the original features as input. Benchmarking was also done with features obtained through principal components analysis, as well as the original features input into a neural network-based classifier. Our proposed framework achieved better predictive performance at the former task and comparable performance at the latter task, while offering the advantage of a more compact and human-interpretable set of features.
翻译:解释性对于许多机器学习应用以信号数据很重要,包括模型与数据相匹配的程度、从中得出准确的解释以及人们能够理解这些数据的程度等各个方面。 特征提取和选择可以通过确定数据结构中既具有信息性又具有直觉意义的结构来改进模型解释性。 为此,我们提议了一个信号分类框架,将特征提取与使用传球过滤器选择特征结合起来,这种方法为某些特征中的虚假发现率(FDR)提供保障。我们将此应用于从细菌样本中测得的拉曼光谱学数据集。我们利用基于波盘的数据特征和物流回归分解器,我们的框架与原始特征作为投入相比,实现了显著更高的预测性准确性。还通过主要组成部分分析获得的特征以及原始特征输入以神经网络为基础的分类器进行基准。我们提议的框架在前一项任务中实现了更好的预测性业绩和后一项任务的可比性业绩,同时提供了更为紧凑和人与人之间的一组特征的优势。