To reduce the human annotation efforts, the programmatic weak supervision (PWS) paradigm abstracts weak supervision sources as labeling functions (LFs) and involves a label model to aggregate the output of multiple LFs to produce training labels. Most existing label models require a parameter learning step for each dataset. In this work, we present a hyper label model that (once learned) infers the ground-truth labels for each dataset in a single forward pass without dataset-specific parameter learning. The hyper label model approximates an optimal analytical (yet computationally intractable) solution of the ground-truth labels. We train the model on synthetic data generated in the way that ensures the model approximates the analytical optimal solution, and build the model upon Graph Neural Network (GNN) to ensure the model prediction being invariant (or equivariant) to the permutation of LFs (or data points). On 14 real-world datasets, our hyper label model outperforms the best existing methods in both accuracy (by 1.4 points on average) and efficiency (by six times on average).
翻译:为了减少人类的批注工作,方案薄弱监督(PWS)模式将薄弱监督源作为标签功能(LFs),并包含一个标签模型,以汇总多个低频的输出,以生成培训标签。大多数现有标签模型要求每个数据集有一个参数学习步骤。在这项工作中,我们提出了一个超级标签模型,(一旦了解)推算出单个远端通道中每个数据集的地面真实标签,而不学习特定数据集的参数。超高标签模型近似于地面真实标签的最佳分析(但难以计算)解决方案。我们培训合成数据模型的方式确保模型接近分析最佳解决方案,并在图形神经网络(GNNN)上建立模型,以确保模型预测与LFs(或数据点)的变异性(或变异性 ) 。在14个真实世界数据集中,我们的超标签模型在准确性(平均为1.4点)和效率(平均为6倍)方面优于现有最佳方法。