We introduce RealTime QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis (weekly in this version). RealTime QA inquires about the current world, and QA systems need to answer questions about novel events or information. It therefore challenges static, conventional assumptions in open domain QA datasets and pursues, instantaneous applications. We build strong baseline models upon large pretrained language models, including GPT-3 and T5. Our benchmark is an ongoing effort, and this preliminary report presents real-time evaluation results over the past month. Our experimental results show that GPT-3 can often properly update its generation results, based on newly-retrieved documents, highlighting the importance of up-to-date information retrieval. Nonetheless, we find that GPT-3 tends to return outdated answers when retrieved documents do not provide sufficient information to find an answer. This suggests an important avenue for future research: can an open domain QA system identify such unanswerable cases and communicate with the user or even the retrieval module to modify the retrieval results? We hope that RealTime QA will spur progress in instantaneous applications of question answering and beyond.
翻译:我们引入了RealTime QA,这是一个动态回答问题(QA)平台,定期发布问题和评估系统(本版本为每周版)。RealTime QA询问当前世界,QA系统需要回答关于新事件或新信息的问题。因此,它挑战开放域QA数据集的静态、传统假设和即时应用。我们在大型预先培训语言模型上构建了强大的基线模型,包括GPT-3和T5。我们的基准是一项持续的努力,本初步报告介绍了过去一个月的实时评价结果。我们的实验结果表明,GPT-3常常能够根据新检索的文件适当更新其生成结果,强调最新信息检索的重要性。然而,我们发现GPT-3在检索文件时往往退回过时的答案,但不能提供充分的信息来找到答案。这表明未来研究的一个重要途径:开放域QA系统能够识别这种无法解答的案例,并与用户甚至检索模块进行沟通,以修改检索结果?我们希望RealTime QA将推动问题立即回答和以后的应用。