In this report, we introduce SciFive, a domain-specific T5 model that has been pre-trained on large biomedical corpora. Our model outperforms the current SOTA methods (i.e. BERT, BioBERT, Base T5) on tasks in named entity relation, relation extraction, natural language inference, and question-answering. We show that text-generation methods have significant potential in a broad array of biomedical NLP tasks, particularly those requiring longer, more complex outputs. Our results support the exploration of more difficult text generation tasks and the development of new methods in this area
翻译:在本报告中,我们引入了SciFive(SciFive),这是一个特定领域的T5模型,在大型生物医学公司方面已经预先接受了培训。我们的模型在名称实体关系、关系提取、自然语言推论和问答方面的任务方面超过了目前的SOTA方法(即BERT、BioBERT、BaseTT、T5基准)。我们表明,文本生成方法在广泛的生物医学NLP任务中具有巨大潜力,特别是那些需要较长、更复杂产出的任务。我们的成果支持探索更困难的文本生成任务和开发这一领域的新方法。