Identifying low-dimensional sufficient structures in nonlinear sufficient dimension reduction (SDR) has long been a fundamental yet challenging problem. Most existing methods lack theoretical guarantees of exhaustiveness in identifying lower dimensional structures, either at the population level or at the sample level. We tackle this issue by proposing a new method, generative sufficient dimension reduction (GenSDR), which leverages modern generative models. We show that GenSDR is able to fully recover the information contained in the central $σ$-field at both the population and sample levels. In particular, at the sample level, we establish a consistency property for the GenSDR estimator from the perspective of conditional distributions, capitalizing on the distributional learning capabilities of deep generative models. Moreover, by incorporating an ensemble technique, we extend GenSDR to accommodate scenarios with non-Euclidean responses, thereby substantially broadening its applicability. Extensive numerical results demonstrate the outstanding empirical performance of GenSDR and highlight its strong potential for addressing a wide range of complex, real-world tasks.
翻译:在非线性充分降维中识别低维充分结构长期以来是一个基础性且具有挑战性的问题。大多数现有方法在识别低维结构方面,无论是在总体层面还是在样本层面,都缺乏关于完备性的理论保证。我们通过提出一种新方法——生成式充分降维,来解决这一问题,该方法利用了现代生成模型。我们证明,GenSDR能够在总体和样本两个层面完全恢复中心$σ$-域所包含的信息。特别是在样本层面,我们利用深度生成模型的分布学习能力,从条件分布的角度为GenSDR估计量建立了一致性性质。此外,通过集成技术,我们将GenSDR扩展到能够处理非欧几里得响应的场景,从而极大地拓宽了其适用性。大量的数值结果展示了GenSDR卓越的实证性能,并突显了其解决广泛复杂现实世界任务的强大潜力。