In recent years, Transformer-based models such as the Switch Transformer have achieved remarkable results in natural language processing tasks. However, these models are often too complex and require extensive pre-training, which limits their effectiveness for small clinical text classification tasks with limited data. In this study, we propose a simplified Switch Transformer framework and train it from scratch on a small French clinical text classification dataset at CHU Sainte-Justine hospital. Our results demonstrate that the simplified small-scale Transformer models outperform pre-trained BERT-based models, including DistillBERT, CamemBERT, FlauBERT, and FrALBERT. Additionally, using a mixture of expert mechanisms from the Switch Transformer helps capture diverse patterns; hence, the proposed approach achieves better results than a conventional Transformer with the self-attention mechanism. Finally, our proposed framework achieves an accuracy of 87\%, precision at 87\%, and recall at 85\%, compared to the third-best pre-trained BERT-based model, FlauBERT, which achieved an accuracy of 84\%, precision at 84\%, and recall at 84\%. However, Switch Transformers have limitations, including a generalization gap and sharp minima. We compare it with a multi-layer perceptron neural network for small French clinical narratives classification and show that the latter outperforms all other models.
翻译:近年来,像Switch Transformer这样基于Transformer的模型已经在自然语言处理任务中取得了显著的结果。然而,这些模型通常太复杂,需要广泛的预训练,这限制了它们在具有有限数据的小型临床文本分类任务中的有效性。本研究中,我们提出了一个简化的Switch Transformer框架,并在圣蒂尼医院的一个小型法语临床文本分类数据集上从头开始训练。我们的结果表明,简化的小规模Transformer模型优于预训练的以BERT为基础的模型,包括DistillBERT、CamemBERT、FlauBERT和FrALBERT。此外,使用Switch Transformer的混合专家机制有助于捕捉不同的模式,因此,所提出的方法比使用自我注意机制的传统Transformer的结果更好。最后,我们的提出的框架在精度、准确率和召回率方面分别取得了87%、87%和85%的结果,而第三好的预训练BERT为基础的模型FlauBERT在精度、准确率和召回率方面的结果分别为84%、84%和84%。然而,Switch Transformer也存在局限性,包括泛化差距和锐利的极值。我们将其与用于小型法语临床叙述分类的多层感知器神经网络进行比较,并显示后者优于所有其他模型。