Risk-based active learning is an approach to developing statistical classifiers for online decision-support. In this approach, data-label querying is guided according to the expected value of perfect information for incipient data points. For SHM applications, the value of information is evaluated with respect to a maintenance decision process, and the data-label querying corresponds to the inspection of a structure to determine its health state. Sampling bias is a known issue within active-learning paradigms; this occurs when an active learning process over- or undersamples specific regions of a feature-space, thereby resulting in a training set that is not representative of the underlying distribution. This bias ultimately degrades decision-making performance, and as a consequence, results in unnecessary costs incurred. The current paper outlines a risk-based approach to active learning that utilises a semi-supervised Gaussian mixture model. The semi-supervised approach counteracts sampling bias by incorporating pseudo-labels for unlabelled data via an EM algorithm. The approach is demonstrated on a numerical example representative of the decision processes found in SHM.
翻译:以风险为基础的积极学习是开发在线决策支持的统计分类方法。在这个方法中,数据标签查询是根据初始数据点的完美信息的预期值来指导数据标签查询的。对于标准、生境和人类住区应用,信息的价值在维护决策程序方面得到评估,数据标签查询与检查确定其健康状况的结构相对应。抽样偏向是积极学习范式中的一个已知问题;当一个积极学习过程在特征空间的具体区域进行超标或低标时出现这种情况,从而形成一套不代表基本分布的训练。这种偏差最终会降低决策性,并因此导致不必要的费用。本文概述了一种基于风险的积极学习方法,即利用半超高斯混合混合物模型。半监督方法通过EM算法将伪标签纳入无标签数据,以抵消抽样偏差。该方法在标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、标准、