Chunking strategies significantly impact the effectiveness of Retrieval-Augmented Generation (RAG) systems. Existing methods operate within fixed-granularity paradigms that rely on static boundary identification, limiting their adaptability to diverse query requirements. This paper presents FreeChunker, a Cross-Granularity Encoding Framework that fundamentally transforms the traditional chunking paradigm: the framework treats sentences as atomic units and shifts from static chunk segmentation to flexible retrieval supporting arbitrary sentence combinations. This paradigm shift not only significantly reduces the computational overhead required for semantic boundary detection but also enhances adaptability to complex queries. Experimental evaluation on LongBench V2 demonstrates that FreeChunker achieves superior retrieval performance compared to traditional chunking methods, while significantly outperforming existing approaches in computational efficiency.
翻译:分块策略对检索增强生成(RAG)系统的效能具有显著影响。现有方法在固定粒度范式下运行,依赖于静态边界识别,限制了其对多样化查询需求的适应性。本文提出FreeChunker,一种跨粒度编码框架,从根本上改变了传统的分块范式:该框架将句子视为原子单元,并从静态分块分割转向支持任意句子组合的灵活检索。这一范式转变不仅显著降低了语义边界检测所需的计算开销,还增强了对复杂查询的适应能力。在LongBench V2上的实验评估表明,与传统分块方法相比,FreeChunker实现了更优的检索性能,同时在计算效率方面显著优于现有方法。