State-of-the-art Extreme Multi-Label Text Classification models rely on multi-label attention to focus on key tokens in input text, but learning good attention weights is challenging. We introduce PLANT - Pretrained and Leveraged Attention - a plug-and-play strategy for initializing attention. PLANT works by planting label-specific attention using a pretrained Learning-to-Rank model guided by mutual information gain. This architecture-agnostic approach integrates seamlessly with large language model backbones such as Mistral-7B, LLaMA3-8B, DeepSeek-V3, and Phi-3. PLANT outperforms state-of-the-art methods across tasks including ICD coding, legal topic classification, and content recommendation. Gains are especially pronounced in few-shot settings, with substantial improvements on rare labels. Ablation studies confirm that attention initialization is a key driver of these gains. For code and trained models, see https://github.com/debjyotiSRoy/xcube/tree/plant
翻译:当前最先进的极端多标签文本分类模型依赖多标签注意力机制聚焦输入文本中的关键标记,但学习有效的注意力权重具有挑战性。我们提出PLANT——预训练与杠杆化注意力——一种即插即用的注意力初始化策略。PLANT通过基于互信息增益的排序学习预训练模型,植入标签特定的注意力权重。这种架构无关的方法可无缝集成于Mistral-7B、LLaMA3-8B、DeepSeek-V3和Phi-3等大型语言模型骨干网络。PLANT在ICD编码、法律主题分类和内容推荐等任务中均超越现有最优方法。在少样本场景下增益尤为显著,对稀有标签的分类性能有大幅提升。消融研究证实注意力初始化是这些改进的关键驱动因素。代码与训练模型详见https://github.com/debjyotiSRoy/xcube/tree/plant