The Mahalanobis distance is a classical tool used to measure the covariance-adjusted distance between points in $\bbR^d$. In this work, we extend the concept of Mahalanobis distance to separable Banach spaces by reinterpreting it as a Cameron-Martin norm associated with a probability measure. This approach leads to a basis-free, data-driven notion of anomaly distance through the so-called variance norm, which can naturally be estimated using empirical measures of a sample. Our framework generalizes the classical $\bbR^d$, functional $(L^2[0,1])^d$, and kernelized settings; importantly, it incorporates non-injective covariance operators. We prove that the variance norm is invariant under invertible bounded linear transformations of the data, extending previous results which are limited to unitary operators. In the Hilbert space setting, we connect the variance norm to the RKHS of the covariance operator, and establish consistency and convergence results for estimation using empirical measures with Tikhonov regularization. Using the variance norm, we introduce the notion of a kernelized nearest-neighbour Mahalanobis distance, and study some of its finite-sample concentration properties. In an empirical study on 12 real-world data sets, we demonstrate that the kernelized nearest-neighbour Mahalanobis distance outperforms the traditional kernelized Mahalanobis distance for multivariate time series novelty detection, using state-of-the-art time series kernels such as the signature, global alignment, and Volterra reservoir kernels.
翻译:马氏距离是用于度量 $\\bbR^d$ 中协方差调整后点间距离的经典工具。本文通过将其重新解释为与概率测度相关的 Cameron-Martin 范数,将马氏距离的概念推广至可分巴拿赫空间。该方法通过所谓的方差范数导出了一个无基、数据驱动的异常距离概念,该范数可自然地利用样本的经验测度进行估计。我们的框架推广了经典的 $\\bbR^d$、函数空间 $(L^2[0,1])^d$ 及核化场景;重要的是,它包含了非单射协方差算子。我们证明了方差范数在数据的可逆有界线性变换下保持不变,这扩展了先前仅限于酉算子的结果。在希尔伯特空间背景下,我们将方差范数与协方差算子的再生核希尔伯特空间联系起来,并建立了基于 Tikhonov 正则化经验测度估计的一致性和收敛性结果。利用方差范数,我们引入了核化最近邻马氏距离的概念,并研究了其部分有限样本集中性质。在 12 个真实世界数据集上的实证研究中,我们使用签名核、全局对齐核和 Volterra 储备核等前沿时间序列核函数,证明了核化最近邻马氏距离在多变量时间序列异常检测中优于传统核化马氏距离。