We study the problem of finding confidence ellipsoids for an arbitrary distribution in high dimensions. Given samples from a distribution $D$ and a confidence parameter $α$, the goal is to find the smallest volume ellipsoid $E$ which has probability mass $\Pr_{D}[E] \ge 1-α$. Ellipsoids are a highly expressive class of confidence sets as they can capture correlations in the distribution, and can approximate any convex set. This problem has been studied in many different communities. In statistics, this is the classic minimum volume estimator introduced by Rousseeuw as a robust non-parametric estimator of location and scatter. However in high dimensions, it becomes NP-hard to obtain any non-trivial approximation factor in volume when the condition number $β$ of the ellipsoid (ratio of the largest to the smallest axis length) goes to $\infty$. This motivates the focus of our paper: can we efficiently find confidence ellipsoids with volume approximation guarantees when compared to ellipsoids of bounded condition number $β$? Our main result is a polynomial time algorithm that finds an ellipsoid $E$ whose volume is within a $O(β^{γd})$ multiplicative factor of the volume of best $β$-conditioned ellipsoid while covering at least $1-O(α/γ)$ probability mass for any $γ< α$. We complement this with a computational hardness result that shows that such a dependence seems necessary up to constants in the exponent. The algorithm and analysis uses the rich primal-dual structure of the minimum volume enclosing ellipsoid and the geometric Brascamp-Lieb inequality. As a consequence, we obtain the first polynomial time algorithm with approximation guarantees on worst-case instances of the robust subspace recovery problem.
翻译:我们研究在高维空间中为任意分布寻找置信椭球的问题。给定来自分布 $D$ 的样本和置信参数 $α$,目标是找到体积最小的椭球 $E$,使其概率质量满足 $\Pr_{D}[E] \ge 1-α$。椭球是一类表达能力极强的置信集,因为它们能够捕捉分布中的相关性,并且可以逼近任何凸集。该问题已在多个不同领域得到研究。在统计学中,这是由Rousseeuw引入的经典最小体积估计器,作为位置和散度的鲁棒非参数估计器。然而在高维情况下,当椭球的条件数 $β$(最长轴与最短轴长度之比)趋于 $\infty$ 时,获得任何非平凡的体积近似因子都成为NP难问题。这促使我们聚焦于本文的核心问题:与条件数有界 $β$ 的椭球相比,我们能否高效地找到具有体积近似保证的置信椭球?我们的主要成果是一个多项式时间算法,该算法找到一个椭球 $E$,其体积在最佳 $β$ 条件椭球体积的 $O(β^{γd})$ 倍乘因子范围内,同时对任意 $γ< α$ 至少覆盖 $1-O(α/γ)$ 的概率质量。我们通过计算复杂性结果补充说明,这种指数依赖关系在常数阶内似乎是必要的。该算法及分析利用了最小包围椭球的丰富原始-对偶结构和几何Brascamp-Lieb不等式。作为推论,我们首次获得了在鲁棒子空间恢复问题最坏情况实例上具有近似保证的多项式时间算法。