Selecting input variables or design points for statistical models has been of great interest in sequential design and active learning. Motivated by two scientific examples, this paper present a strategy of selecting the design points for a regression model when the underlying regression function is discontinuous. The first example is compressive material imaging with the purpose of accelerating the imaging speed, and the second example is a sequential design for learning a phase diagram in chemistry. In both examples, the underlying regression functions have discontinuities, so many of the existing design optimization approaches cannot be applied for the two examples, because they mostly assume a continuous regression function. There are a few studies for estimating a discontinuous regression function from its noisy observations, but all noisy observations are typically provided in advance in these studies. In this paper, we develop a design strategy of selecting the design points for regression analysis with discontinuities. We first review the existing approaches relevant to design optimization and active learning for regression analysis and discuss their limitations in handling a discontinuous regression function. We then present our novel design strategy for a regression analysis with discontinuities: some statistical properties with a fixed design will be presented first, and then these properties will be used to propose a new criterion of selecting the design points for the regression analysis. Sequential design with the new criterion will be presented with comprehensive simulated examples, and its application to the two motivating examples will be presented.
翻译:为统计模型选择输入变量或设计点对顺序设计和积极学习非常感兴趣。 本文以两个科学实例为动力, 在基础回归函数不连续时, 提出选择回归模型设计点的战略。 第一个例子是压缩材料成像, 目的是加速成像速度, 第二个例子是学习化学阶段图的顺序设计。 在两个例子中, 基础回归函数具有不连续性, 因此许多现有的设计优化方法无法适用于这两个例子, 因为它们大多具有持续的回归功能。 有一些研究用于估计其噪音观测产生的不连续回归函数, 但通常在这些研究中先提供所有噪声观测。 在本文中, 我们制定了选择回归分析设计点的设计战略。 我们首先审查与设计优化和积极学习以进行回归分析有关的现有方法, 并讨论其在处理不连续回归功能方面的局限性。 我们然后提出我们新的回归分析设计战略, 因为它们大多具有持续的回归功能。 一些具有固定设计的统计属性, 这些属性将被用来提出一个新的标准, 其设计模型将用来选择新的模型, 将用来模拟其设计模型。