With the rapid and continuous increase in academic publications, identifying high-quality research has become an increasingly pressing challenge. While recent methods leveraging Large Language Models (LLMs) for automated paper evaluation have shown great promise, they are often constrained by outdated domain knowledge and limited reasoning capabilities. In this work, we present PaperEval, a novel LLM-based framework for automated paper evaluation that addresses these limitations through two key components: 1) a domain-aware paper retrieval module that retrieves relevant concurrent work to support contextualized assessments of novelty and contributions, and 2) a latent reasoning mechanism that enables deep understanding of complex motivations and methodologies, along with comprehensive comparison against concurrently related work, to support more accurate and reliable evaluation. To guide the reasoning process, we introduce a progressive ranking optimization strategy that encourages the LLM to iteratively refine its predictions with an emphasis on relative comparison. Experiments on two datasets demonstrate that PaperEval consistently outperforms existing methods in both academic impact and paper quality evaluation. In addition, we deploy PaperEval in a real-world paper recommendation system for filtering high-quality papers, which has gained strong engagement on social media -- amassing over 8,000 subscribers and attracting over 10,000 views for many filtered high-quality papers -- demonstrating the practical effectiveness of PaperEval.
翻译:随着学术出版物快速且持续增长,识别高质量研究已成为日益紧迫的挑战。尽管近期利用大型语言模型(LLMs)进行自动化论文评估的方法展现出巨大潜力,但它们常受限于过时的领域知识和有限的推理能力。本文提出PaperEval,一种基于LLM的新型自动化论文评估框架,通过两个关键组件解决这些局限:1)领域感知论文检索模块,可检索相关同期工作以支持对新颖性和贡献的情境化评估;2)潜在推理机制,能够深入理解复杂动机与方法论,并与同期相关工作进行全面比较,以支持更准确可靠的评估。为引导推理过程,我们引入渐进式排序优化策略,鼓励LLM以相对比较为重点迭代优化其预测。在两个数据集上的实验表明,PaperEval在学术影响力和论文质量评估方面均持续优于现有方法。此外,我们将PaperEval部署于实际论文推荐系统中以筛选高质量论文,该系统在社交媒体上获得高度参与——累计超过8,000名订阅者,且多篇筛选出的高质量论文吸引逾10,000次浏览——这证明了PaperEval的实际有效性。