In today's age, it is becoming increasingly difficult to decipher truth from lies. Every day, politicians, media outlets, and public figures make conflicting claims$\unicode{x2014}$often about topics that can, in principle, be verified against structured data. For instance, statements about crime rates, economic growth or healthcare can all be verified against official public records and structured datasets. Building a system that can automatically do that would have sounded like science fiction just a few years ago. Yet, with the extraordinary progress in LLMs and agentic AI, this is now within reach. Still, there remains a striking gap between what is technically possible and what is being demonstrated by recent work. Most existing verification systems operate only on small, single-table databases$\unicode{x2014}$typically a few hundred rows$\unicode{x2014}$that conveniently fit within an LLM's context window. In this paper we report our progress on Thucy, the first cross-database, cross-table multi-agent claim verification system that also provides concrete evidence for each verification verdict. Thucy remains completely agnostic to the underlying data sources before deployment and must therefore autonomously discover, inspect, and reason over all available relational databases to verify claims. Importantly, Thucy also reports the exact SQL queries that support its verdict (whether the claim is accurate or not) offering full transparency to expert users familiar with SQL. When evaluated on the TabFact dataset$\unicode{x2014}$the standard benchmark for fact verification over structured data$\unicode{x2014}$Thucy surpasses the previous state of the art by 5.6 percentage points in accuracy (94.3% vs. 88.7%).
翻译:在当今时代,辨别真相与谎言正变得越来越困难。每天,政治家、媒体机构和公众人物都会提出相互矛盾的声明——这些声明所涉及的主题原则上往往可以通过结构化数据进行验证。例如,关于犯罪率、经济增长或医疗保健的陈述都可以对照官方公共记录和结构化数据集进行核实。构建一个能够自动完成这一任务的系统在几年前听起来还像是科幻小说。然而,随着大型语言模型(LLM)和智能体人工智能的非凡进步,这已成为可能。尽管如此,技术上的可能性与近期研究展示的成果之间仍存在显著差距。大多数现有的验证系统仅能在小型、单表数据库上运行——通常只有几百行数据——这些数据恰好能放入LLM的上下文窗口。在本文中,我们报告了Thucy的进展,这是首个跨数据库、跨表的多智能体声明验证系统,并为每个验证结论提供具体证据。Thucy在部署前对底层数据源完全不可知,因此必须自主发现、检查并基于所有可用的关系数据库进行推理以验证声明。重要的是,Thucy还会报告支持其结论(无论声明准确与否)的确切SQL查询,为熟悉SQL的专家用户提供完全透明的解释。在TabFact数据集(结构化数据事实验证的标准基准)上进行评估时,Thucy的准确率比先前的最先进方法高出5.6个百分点(94.3%对比88.7%)。