Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration. We treat this task as a ranking problem, which we tackle with a two-stage approach: candidate generation followed by re-ranking. Within this framework, we adapt to the scientific domain a proven combination based on "bag of words" retrieval followed by re-scoring with a BERT model. We experimentally show the effects of domain adaptation, both in terms of pretraining on in-domain data and exploiting in-domain vocabulary. In addition, we introduce a novel navigation-based document expansion strategy to enrich the candidate documents processed by our neural models. On three different collections from different scientific disciplines, we achieve the best-reported results in the citation recommendation task.
翻译:科学文献的引用建议系统,以帮助作者找到应该引用的文件,有可能加速发现并发现科学探索的新途径。我们将此任务视为一个分级问题,我们用两个阶段的方法来处理,即:候选一代,然后重新排名。在这个框架内,我们根据“一包单词”检索,然后用BERT模型重新标注,对科学领域进行调整。我们实验性地展示了领域适应的效果,既包括就主数据进行预先培训和利用主词汇。此外,我们引入了基于导航的新文件扩展战略,以丰富我们神经模型处理的候选文件。在三个不同的科学学科中,我们实现了引用建议任务中报告的最佳结果。