Traditional methods for spatial inference estimate smooth interpolating fields based on features measured at well-located points. When the spatial locations of some observations are missing, joint inference of the fields and locations is possible as the fields inform the locations and vice versa. If the number of missing locations is large, conventional Bayesian Inference fails if the generative model for the data is even slightly mis-specified, due to feedback between estimated fields and the imputed locations. Semi-Modular Inference (SMI) offers a solution by controlling the feedback between different modular components of the joint model using a hyper-parameter called the influence parameter. Our work is motivated by linguistic studies on a large corpus of late-medieval English textual dialects. We simultaneously learn dialect fields using dialect features observed in ``anchor texts'' with known location and estimate the location of origin for ``floating'' textual dialects of unknown origin. The optimal influence parameter minimises a loss measuring the accuracy of held-out anchor data. We compute a (flow-based) variational approximation to the SMI posterior for our model. This allows efficient computation of the optimal influence. MCMC-based approaches, feasible on small subsets of the data, are used to check the variational approximation.
翻译:传统的空间推断方法基于在已知位置点测量的特征来估计平滑插值场。当部分观测的空间位置缺失时,可通过联合推断场与位置实现相互约束:场信息辅助定位,反之亦然。若缺失位置数量较大,即使生成模型存在轻微设定错误,传统贝叶斯推断也会因估计场与填补位置间的反馈效应而失效。半模块推断通过名为影响参数的超参数控制联合模型中不同模块组件间的反馈,为此提供了解决方案。本研究受大规模中世纪晚期英语方言文本语料库的语言学分析驱动,同步实现了两项任务:利用已知位置的“锚定文本”中观测到的方言特征学习方言场,并估计未知来源的“游离文本”方言的起源位置。最优影响参数通过最小化衡量预留锚定数据准确性的损失函数确定。我们为模型计算了基于流变分的SMI后验近似,从而实现对最优影响参数的高效计算。同时采用基于MCMC的方法(在数据子集上可行)验证变分近似的准确性。