We present the Hierarchical Transformer Networks for modeling long-term dependencies across clinical notes for the purpose of patient-level prediction. The network is equipped with three levels of Transformer-based encoders to learn progressively from words to sentences, sentences to notes, and finally notes to patients. The first level from word to sentence directly applies a pre-trained BERT model, and the second and third levels both implement a stack of 2-layer encoders before the final patient representation is fed into the classification layer for clinical predictions. Compared to traditional BERT models, our model increases the maximum input length from 512 words to much longer sequences that are appropriate for long sequences of clinical notes. We empirically examine and experiment with different parameters to identify an optimal trade-off given computational resource limits. Our experimental results on the MIMIC-III dataset for different prediction tasks demonstrate that our proposed hierarchical model outperforms previous state-of-the-art hierarchical neural networks.
翻译:我们提出了用于在临床诊断中进行长期依赖性建模的等级变换网络,用于在临床诊断中进行临床诊断。这个网络配备了三级基于变换器的编码器,从文字到句子、句子到笔记和最后给病人作笔记。从字到句的第一级直接应用经过预先训练的BERT模型,第二和第三一级在将最终患者代表情况输入临床预测分类层之前,都使用堆叠的2层编码器。与传统的BERT模型相比,我们的模型将最大输入长度从512个单词提高到长得多的顺序,以适合长的临床笔记顺序。我们用不同参数进行实验和实验,以确定最佳的权衡计算资源限度。我们在用于不同预测任务的MIMIC-III数据集上的实验结果表明,我们提议的等级模型比以前的状态级神经网络要强。