文本分类(Text Classification)任务是根据给定文档的内容或主题,自动分配预先定义的类别标签。

VIP内容

持续学习变得越来越重要,因为它使NLP模型能够随着时间的推移不断地学习和获取知识。以往的持续学习方法主要是为了保存之前任务的知识,并没有很好地将模型推广到新的任务中。在这项工作中,我们提出了一种基于信息分解的正则化方法用于文本分类的持续学习。我们提出的方法首先将文本隐藏空间分解为对所有任务都适用的表示形式和对每个单独任务都适用的表示形式,并进一步对这些表示形式进行不同的规格化,以更好地约束一般化所需的知识。我们还介绍了两个简单的辅助任务:下一个句子预测和任务id预测,以学习更好的通用和特定表示空间。在大规模基准上进行的实验证明了我们的方法在不同序列和长度的连续文本分类任务中的有效性。

成为VIP会员查看完整内容
0
14

最新论文

Weakly-supervised text classification aims to induce text classifiers from only a few user-provided seed words. The vast majority of previous work assumes high-quality seed words are given. However, the expert-annotated seed words are sometimes non-trivial to come up with. Furthermore, in the weakly-supervised learning setting, we do not have any labeled document to measure the seed words' efficacy, making the seed word selection process "a walk in the dark". In this work, we remove the need for expert-curated seed words by first mining (noisy) candidate seed words associated with the category names. We then train interim models with individual candidate seed words. Lastly, we estimate the interim models' error rate in an unsupervised manner. The seed words that yield the lowest estimated error rates are added to the final seed word set. A comprehensive evaluation of six binary classification tasks on four popular datasets demonstrates that the proposed method outperforms a baseline using only category name seed words and obtained comparable performance as a counterpart using expert-annotated seed words.

0
0
下载
预览
Top