With the development of Internet technology, the phenomenon of information overload is becoming more and more obvious. It takes a lot of time for users to obtain the information they need. However, keyphrases that summarize document information highly are helpful for users to quickly obtain and understand documents. For academic resources, most existing studies extract keyphrases through the title and abstract of papers. We find that title information in references also contains author-assigned keyphrases. Therefore, this article uses reference information and applies two typical methods of unsupervised extraction methods (TF*IDF and TextRank), two representative traditional supervised learning algorithms (Na\"ive Bayes and Conditional Random Field) and a supervised deep learning model (BiLSTM-CRF), to analyze the specific performance of reference information on keyphrase extraction. It is expected to improve the quality of keyphrase recognition from the perspective of expanding the source text. The experimental results show that reference information can increase precision, recall, and F1 of automatic keyphrase extraction to a certain extent. This indicates the usefulness of reference information on keyphrase extraction of academic papers and provides a new idea for the following research on automatic keyphrase extraction.
翻译:随着互联网技术的发展,信息超载现象正在变得越来越明显,用户需要很多时间才能获得他们所需要的信息。然而,文件信息摘要的要点非常有助于用户迅速获取和理解文件。对于学术资源,大多数现有研究通过文件的标题和摘要提取关键词。我们发现,参考资料中的标题信息还包含作者指定的关键词句。因此,本条使用参考信息,并应用两种典型的方法,即未经监督的提取方法(TF*IDF和TextRank)、两个有代表性的传统监督学习算法(Na\“ive Bayes和条件随机字段)和一个有监督的深层次学习模型(BILSTM-CRF),以分析关键词提取参考信息的具体性能。从扩展源文本的角度,预期提高关键词识别质量。实验结果显示,参考资料可以在一定程度上提高自动关键词提取的精确度、回顾和F1。这表明关于关键词提取的参考信息有用,并为随后的自动关键词提取研究提供新的想法。