Neural text classification models typically treat output labels as categorical variables which lack description and semantics. This forces their parametrization to be dependent on the label set size, and, hence, they are unable to scale to large label sets and generalize to unseen ones. Existing joint input-label text models overcome these issues by exploiting label descriptions, but they are unable to capture complex label relationships, have rigid parametrization, and their gains on unseen labels happen often at the expense of weak performance on the labels seen during training. In this paper, we propose a new input-label model which generalizes over previous such models, addresses their limitations and does not compromise performance on seen labels. The model consists of a joint non-linear input-label embedding with controllable capacity and a joint-space-dependent classification unit which is trained with cross-entropy loss to optimize classification performance. We evaluate models on full-resource and low- or zero-resource text classification of multilingual news and biomedical text with a large label set. Our model outperforms monolingual and multilingual models which do not leverage label semantics and previous joint input-label space models in both scenarios.
翻译:神经文本分类模型通常将输出标签视为缺乏描述和语义的绝对变量。 这迫使它们的准称性模型依赖于标签设置大小, 因而无法向大型标签组扩展, 无法将其推广到普通的标签。 现有的联合输入标签文本模型通过使用标签描述克服了这些问题, 但是它们无法捕捉复杂的标签关系, 僵硬的对称性, 以及它们在无形标签上的收益往往以培训期间所看到标签上的微弱性能为代价而发生。 在本文中, 我们提出了一个新的输入标签模型, 该模型概括了先前的这种模型, 解决了它们的局限性, 并且不会影响所看到标签的性能。 该模型包括一个联合的非线性输入标签嵌入能力, 以及一个联合空间依赖分类单位, 该单位经过交叉机率损失的培训, 以优化分类性性性能。 我们用一个大标签来评估关于多种语言新闻和生物医学文本的完整资源和低或零资源文本分类模型模型。 我们的模型外型和多语言和多语模式模型, 在两种情景中都不使用标签。