The assumption that response and predictor belong to the same statistical unit may be violated in practice. Unbiased estimation and recovery of true label ordering based on unlabeled data are challenging tasks and have attracted increasing attentions in the recent literature. In this paper, we present a relatively complete analysis of label permutation problem for the generalized linear model with multivariate responses. The theory is established under different scenarios, with knowledge of true parameters, with partial knowledge of underlying label permutation matrix and without any knowledge. Our results remove the stringent conditions required by the current literature and are further extended to the missing observation setting which has never been considered in the field of label permutation problem. On computational side, we propose two methods, "maximum likelihood estimation" algorithm and "two-step estimation" algorithm, to accommodate for different settings. When the proportion of permuted labels is moderate, both methods work effectively. Multiple numerical experiments are provided and corroborate our theoretical findings.
翻译:根据未贴标签的数据,无偏见地估计和收回真实标签定购是具有挑战性的任务,并在最近的文献中引起越来越多的注意。在本文件中,我们对通用线性模型的标签变异问题进行了相对完整的分析,并作出了多变反应。理论是在不同的假设下建立的,对真实参数有了解,对标签变异矩阵有部分了解,而且没有任何了解。我们的结果消除了当前文献所要求的严格条件,并进一步扩展到了在标签变异问题领域从未考虑过的缺失的观察环境。在计算方面,我们提出了两种方法,即“最高可能性估计”算法和“两步估计”算法,以适应不同的环境。当变异标签的比例是适度的时,两种方法都是有效的。提供了多种数字实验,并证实了我们的理论结论。