Traditional DBMSs execute user- or application-provided SQL queries over relational data with strong semantic guarantees and advanced query optimization, but writing complex SQL is hard and focuses only on structured tables. Contemporary multimodal systems (which operate over relations but also text, images, and even videos) either expose low-level controls that force users to use (and possibly create) machine learning UDFs manually within SQL or offload execution entirely to black-box LLMs, sacrificing usability or explainability. We propose KathDB, a new system that combines relational semantics with the reasoning power of foundation models over multimodal data. Furthermore, KathDB includes human-AI interaction channels during query parsing, execution, and result explanation, such that users can iteratively obtain explainable answers across data modalities.
翻译:传统数据库管理系统(DBMS)通过执行用户或应用程序提供的关系型数据SQL查询,具备强语义保证和先进的查询优化能力,但编写复杂SQL语句难度较高,且仅专注于结构化表格。当前的多模态系统(可处理关系型数据以及文本、图像甚至视频)要么提供底层控制,强制用户在SQL中手动使用(甚至创建)机器学习用户定义函数(UDF),要么将执行完全交由黑盒大型语言模型(LLM)处理,从而牺牲了易用性或可解释性。本文提出KathDB,这是一种将关系语义与基础模型在多模态数据上的推理能力相结合的新系统。此外,KathDB在查询解析、执行和结果解释阶段引入了人机交互通道,使用户能够跨数据模态迭代获取可解释的答案。