Automatically associating ICD codes with electronic health data is a well-known NLP task in medical research. NLP has evolved significantly in recent years with the emergence of pre-trained language models based on Transformers architecture, mainly in the English language. This paper adapts these models to automatically associate the ICD codes. Several neural network architectures have been experimented with to address the challenges of dealing with a large set of both input tokens and labels to be guessed. In this paper, we propose a model that combines the latest advances in NLP and multi-label classification for ICD-10 code association. Fair experiments on a Clinical dataset in the French language show that our approach increases the $F_1$-score metric by more than 55\% compared to state-of-the-art results.
翻译:自动将ICD代码与电子健康数据相关联是医学研究中众所周知的NLP任务。最近几年,随着基于Transformer结构的预训练语言模型的出现,NLP在英语中的应用已经取得了显著的进展。本文将这些模型适应于自动关联ICD代码。为了解决处理大量的输入标记和待猜测的标签的挑战,我们尝试了几种神经网络架构。在本文中,我们提出了一种模型,该模型结合了NLP和多标签分类对ICD-10代码关联的最新进展。对法语语言的临床数据集进行的公平实验表明,与最先进的结果相比,我们的方法使F1分数指标增加了超过55%。