相同内容，不同表示：面向表格问答的对照研究 (Same Content, Different Representations: A Controlled Study for Table QA)

Table Question Answering (Table QA) in real-world settings must operate over both structured databases and semi-structured tables containing textual fields. However, existing benchmarks are tied to fixed data formats and have not systematically examined how representation itself affects model performance. We present the first controlled study that isolates the role of table representation by holding content constant while varying structure. Using a verbalization pipeline, we generate paired structured and semi-structured tables, enabling direct comparisons across modeling paradigms. To support detailed analysis, we introduce RePairTQA, a diagnostic benchmark with splits along table size, join requirements, query complexity, and schema quality. Our experiments reveal consistent trade-offs: SQL-based methods achieve high accuracy on structured inputs but degrade on semi-structured data, LLMs exhibit flexibility but reduced precision, and hybrid approaches strike a balance, particularly under noisy schemas. These effects intensify with larger tables and more complex queries. Ultimately, no single method excels across all conditions, and we highlight the central role of representation in shaping Table QA performance. Our findings provide actionable insights for model selection and design, paving the way for more robust hybrid approaches suited for diverse real-world data formats.

翻译：现实场景中的表格问答系统必须同时处理结构化数据库和包含文本字段的半结构化表格。然而，现有基准测试受限于固定数据格式，未能系统性地考察表示形式本身如何影响模型性能。我们提出了首个对照研究，通过保持内容不变而改变结构来分离表格表示的作用。利用言语化生成流程，我们构建了配对的完全结构化与半结构化表格，实现了跨建模范式的直接比较。为支持细粒度分析，我们提出了RePairTQA诊断基准，其数据划分维度涵盖表格规模、连接操作需求、查询复杂度及模式质量。实验结果表明存在一致的性能权衡：基于SQL的方法在结构化输入上准确率高，但在半结构化数据上性能下降；大语言模型展现出灵活性但精度降低；混合方法实现了平衡，在噪声模式条件下表现尤为突出。这些效应随表格规模扩大和查询复杂度增加而加剧。最终，没有任何单一方法能在所有条件下表现优异，我们强调了表示形式在决定表格问答性能中的核心作用。本研究为模型选择与设计提供了可操作的见解，为开发适应多样化现实数据格式的鲁棒混合方法奠定了基础。