Nonstationary non-Gaussian spatial data are common in many disciplines, including climate science, ecology, epidemiology, and social sciences. Examples include count data on disease incidence and binary satellite data on cloud mask (cloud/no-cloud). Modeling such datasets as stationary spatial processes can be unrealistic since they are collected over large heterogeneous domains (i.e., spatial behavior differs across subregions). Although several approaches have been developed for nonstationary spatial models, these have focused primarily on Gaussian responses. In addition, fitting nonstationary models for large non-Gaussian datasets is computationally prohibitive. To address these challenges, we propose a scalable algorithm for modeling such data by leveraging parallel computing in modern high-performance computing systems. We partition the spatial domain into disjoint subregions and fit locally nonstationary models using a carefully curated set of spatial basis functions. Then, we combine the local processes using a novel neighbor-based weighting scheme. Our approach scales well to massive datasets (e.g., 1 million samples) and can be implemented in nimble, a popular software environment for Bayesian hierarchical modeling. We demonstrate our method to simulated examples and two large real-world datasets pertaining to infectious diseases and remote sensing.
翻译:在气候科学、生态学、流行病学和社会科学等许多学科中,非静止非古日文空间数据是常见的,其中包括关于疾病发生率的计数数据和云面掩码(cloud/no-cloud)上的二进制卫星数据。将这类数据集建为固定空间过程可能不切实际,因为它们是在大型不同领域收集的(即各次区域的空间行为各不相同)。虽然为非静止空间模型制定了几种方法,但这些方法主要侧重于高斯人的反应。此外,为大型非古日文数据集安装适当的非静止模型在计算上令人望而却步。为了应对这些挑战,我们建议采用可扩缩的算法,在现代高性能计算系统中利用平行计算方法来模拟这些数据。我们将空间域分成不相交错的次区域,并利用一套精心调整的空间基础功能来适应当地非静止模型。然后,我们利用一种新型的邻居加权办法将当地程序结合起来。我们的方法对大规模非古代非古代非古代数据集(例如100万个样本)进行了精确的计算。为了应对这些挑战,我们可以用两种微小的软件环境来模拟与巴伊氏级疾病。