Accurate question answering over real spreadsheets remains difficult due to multirow headers, merged cells, and unit annotations that disrupt naive chunking, while rigid SQL views fail on files lacking consistent schemas. We present SQuARE, a hybrid retrieval framework with sheet-level, complexity-aware routing. It computes a continuous score based on header depth and merge density, then routes queries either through structure-preserving chunk retrieval or SQL over an automatically constructed relational representation. A lightweight agent supervises retrieval, refinement, or combination of results across both paths when confidence is low. This design maintains header hierarchies, time labels, and units, ensuring that returned values are faithful to the original cells and straightforward to verify. Evaluated on multi-header corporate balance sheets, a heavily merged World Bank workbook, and diverse public datasets, SQuARE consistently surpasses single-strategy baselines and ChatGPT-4o on both retrieval precision and end-to-end answer accuracy while keeping latency predictable. By decoupling retrieval from model choice, the system is compatible with emerging tabular foundation models and offers a practical bridge toward a more robust table understanding.
翻译:由于多行表头、合并单元格和单位标注会破坏简单的分块处理,而僵化的SQL视图在缺乏一致模式的文件上表现不佳,因此对真实电子表格进行精确问答仍然具有挑战性。我们提出了SQuARE,一种具备工作表级别、复杂度感知路由的混合检索框架。该框架基于表头深度和合并密度计算连续评分,随后通过结构保持的分块检索或基于自动构建的关系表示进行SQL查询来路由问题。当置信度较低时,一个轻量级智能体负责监督两条路径上的检索、结果精化或组合。此设计保留了表头层级、时间标签和单位,确保返回的值忠实于原始单元格且易于验证。在包含多表头的企业资产负债表、大量合并的世界银行工作簿以及多样化的公共数据集上进行评估,SQuARE在检索精度和端到端答案准确性方面均持续超越单一策略基线和ChatGPT-4o,同时保持可预测的延迟。通过将检索与模型选择解耦,该系统兼容新兴的表格基础模型,并为实现更鲁棒的表格理解提供了实用桥梁。