Efficient data processing is increasingly vital, with query optimizers playing a fundamental role in translating SQL queries into optimal execution plans. Traditional cost-based optimizers, however, often generate suboptimal plans due to flawed heuristics and inaccurate cost models, leading to the emergence of Learned Query Optimizers (LQOs). To address challenges in existing LQOs, such as the inconsistency and suboptimality inherent in pairwise ranking methods, we introduce CARPO, a generic framework leveraging listwise learning-to-rank for context-aware query plan optimization. CARPO distinctively employs a Transformer-based model for holistic evaluation of candidate plan sets and integrates a robust hybrid decision mechanism, featuring Out-Of-Distribution (OOD) detection with a top-k fallback strategy to ensure reliability. Furthermore, CARPO can be seamlessly integrated with existing plan embedding techniques, demonstrating strong adaptability. Comprehensive experiments on TPC-H and STATS benchmarks demonstrate that CARPO significantly outperforms both native PostgreSQL and Lero, achieving a Top-1 Rate of 74.54% on the TPC-H benchmark compared to Lero's 3.63%, and reducing the total execution time to 3719.16 ms compared to PostgreSQL's 22577.87 ms.
翻译:高效数据处理日益重要,查询优化器在将SQL查询转换为最优执行计划中发挥着基础性作用。然而,传统的基于成本的优化器常因启发式规则缺陷和成本模型不准确而生成次优计划,这推动了学习型查询优化器的出现。针对现有学习型查询优化器中存在的挑战,例如成对排序方法固有的不一致性与次优性,本文提出CARPO——一个利用列表式学习排序实现上下文感知查询计划优化的通用框架。CARPO创新性地采用基于Transformer的模型对候选计划集进行整体评估,并集成了鲁棒的混合决策机制,该机制通过结合分布外检测与Top-k回退策略确保系统可靠性。此外,CARPO能够与现有计划嵌入技术无缝集成,展现出强大的适应性。在TPC-H与STATS基准测试上的综合实验表明,CARPO显著优于原生PostgreSQL与Lero系统:在TPC-H基准测试中Top-1命中率达到74.54%(Lero仅为3.63%),总执行时间降至3719.16毫秒(PostgreSQL为22577.87毫秒)。