The rapid evolution of scientific research has been creating a huge volume of publications every year. Among the many quantification measures of scientific impact, citation count stands out for its frequent use in the research community. Although peer review process is the mainly reliable way of predicting a paper's future impact, the ability to foresee lasting impact on the basis of citation records is increasingly important in the scientific impact analysis in the era of big data. This paper focuses on the long-term citation count prediction for individual publications, which has become an emerging and challenging applied research topic. Based on the four key phenomena confirmed independently in previous studies of long-term scientific impact quantification, including the intrinsic quality of publications, the aging effect and the Matthew effect and the recency effect, we unify the formulations of all these observations in this paper. Building on a foundation of the above formulations, we propose a long-term citation count prediction model for individual papers via recurrent neural network with long short-term memory units. Extensive experiments on a real-large citation data set demonstrate that the proposed model consistently outperforms existing methods, and achieves a significant performance improvement.
翻译:科学研究的迅速发展每年创造了大量出版物,在科学影响的许多量化措施中,引证数是研究界经常使用的。虽然同侪审查过程主要是预测论文未来影响的可靠方法,但根据引证记录预测持久影响的能力在大数据时代的科学影响分析中日益重要。本文侧重于个别出版物的长期引证数预测,这已成为一个新兴和具有挑战性的应用研究专题。根据以前对长期科学影响量化的研究中独立确认的四个关键现象,包括出版物的内在质量、老化效应和马修效应及耐久效应,我们统一了本文中所有这些意见的表述。我们根据上述提法,提议通过具有长期短期记忆单位的经常性神经网络对个别文件进行长期引证数预测模型。对实际大量引证数据进行的广泛试验表明,拟议的模型一贯地优于现有方法,并取得了显著的业绩改进。