We prove that $\tilde{\Theta}(k d^2 / \varepsilon^2)$ samples are necessary and sufficient for learning a mixture of $k$ Gaussians in $\mathbb{R}^d$, up to error $\varepsilon$ in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that $\tilde{O}(k d / \varepsilon^2)$ samples suffice, matching a known lower bound. Moreover, these results hold in the agnostic-learning/robust-estimation setting as well, where the target distribution is only approximately a mixture of Gaussians. The upper bound is shown using a novel technique for distribution learning based on a notion of `compression.' Any class of distributions that allows such a compression scheme can also be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. The core of our main result is showing that the class of Gaussians in $\mathbb{R}^d$ admits a small-sized compression scheme.
翻译:我们证明 $\ tilde\ theta} (k d ⁇ 2 /\ varepsilon}2) 样本是必要和足够的, 足以用$mathb{R ⁇ d$学习高斯人混合的混合物, 直至在总变差距离中误差$\ varepsilon$。 这改善了已知的上界和下界。 对于轴拉动高斯人的混合物, 我们也可以用少量样本来学习允许这种压缩计划的销售类别。 此外, 如果销售类别有这样的压缩计划, 那么这些结果将保留在不明显学习/ robust 估测的设置中, 并且目标分布只是大约是高斯人混合的。 上界将使用一种新颖的方法, 用于基于“ 压缩” 概念的分布学习。 对于任何允许这种压缩计划的销售类别, 也可以用少量样本来学习。 此外, 如果销售类别有这样的压缩计划, 那么这些分类的产品和混合物的分类和混合物会以 ${rbrbs= b 主要结果显示 。