多领域积极学习:文学审查和比较研究 (Multi-Domain Active Learning: Literature Review and Comparative Study)

Multi-domain learning (MDL) refers to learning a set of models simultaneously, where each model is specialized to perform a task in a particular domain. Generally, a high labeling effort is required in MDL, as data needs to be labeled by human experts for every domain. Active learning (AL) can be utilized in MDL to reduce the labeling effort by only using the most informative data. The resultant paradigm is termed multi-domain active learning (MDAL). In this work, we provide an exhaustive literature review for MDAL on the relevant fields, including AL, cross-domain information sharing schemes, and cross-domain instance evaluation approaches. It is found that the few studies which have been directly conducted on MDAL cannot serve as off-the-shelf solutions on more general MDAL tasks. To fill this gap, we construct a pipeline of MDAL and present a comprehensive comparative study of thirty different algorithms, which are established by combining six representative MDL models and five commonly used AL strategies. We evaluate the algorithms on six datasets involving textual and visual classification tasks. In most cases, AL brings notable improvements to MDL, and the naive best vs. second best (BvSB) Uncertainty strategy can perform competitively with the state-of-the-art AL strategies. Besides, BvSB with the MAN model can consistently achieve top or above-average performance on all the datasets. Furthermore, we qualitatively analyze the behaviors of the well-performed strategies and models, shedding light on their superior performance in the comparison. Finally, we recommend to use BvSB with the MAN model in the application of MDAL due to their good performance in the experiments.

翻译：多域学习( MDL) 指的是同时学习一组模型, 每一模型都专门用于执行特定领域的任务。一般来说, MDL 需要高标签工作, 因为数据需要由人类专家为每个领域贴上标签。 MDL 可以使用积极学习( AL) 来减少标签工作, 仅使用信息量最大的数据。由此产生的范例被称为多域积极学习( MDL ) 。在这项工作中, 我们为MDAL 提供有关领域的完整文献审查, 包括AL、跨域信息共享计划和跨域实例评价方法。发现, MDL 直接进行的一些研究不能作为所有领域都使用的现成解决方案。为了填补这一空白, 我们建了一个MDAL 管道, 并对30种不同的算法进行了全面的比较研究, 它们是将6种具有代表性的MDL 模型和5种常用的模型 AL 。我们评估了六种包含文本和视觉分类任务的数据集的算法。在多数情况下, AL 直接进行的明显改进了MDL, 其高级应用的高级性战略, 以及 MANS B 的SB 。最佳的SB, 和最佳的SB 和最高级性能。