Retrieval-augmented generation (RAG) methods can enhance the performance of LLMs by incorporating retrieved knowledge chunks into the generation process. In general, the retrieval and generation steps usually have different requirements for these knowledge chunks. The retrieval step benefits from comprehensive information to improve retrieval accuracy, whereas excessively long chunks may introduce redundant contextual information, thereby diminishing both the effectiveness and efficiency of the generation process. However, existing RAG methods typically employ identical representations of knowledge chunks for both retrieval and generation, resulting in suboptimal performance. In this paper, we propose a heterogeneous RAG framework (\myname) that decouples the representations of knowledge chunks for retrieval and generation, thereby enhancing the LLMs in both effectiveness and efficiency. Specifically, we utilize short chunks to represent knowledge to adapt the generation step and utilize the corresponding chunk with its contextual information from multi-granular views to enhance retrieval accuracy. We further introduce an adaptive prompt tuning method for the retrieval model to adapt the heterogeneous retrieval augmented generation process. Extensive experiments demonstrate that \myname achieves significant improvements compared to baselines.
翻译:检索增强生成(RAG)方法通过将检索到的知识片段融入生成过程,能够提升大语言模型(LLM)的性能。通常,检索步骤和生成步骤对这些知识片段有不同的要求。检索步骤受益于全面的信息以提高检索准确性,而过长的片段则可能引入冗余的上下文信息,从而降低生成过程的效果和效率。然而,现有的RAG方法通常对检索和生成使用相同的知识片段表示,导致性能欠佳。本文提出一种异构RAG框架(HeteRAG),它将知识片段的表示解耦为用于检索和用于生成的两部分,从而在效果和效率两方面同时增强LLM。具体而言,我们使用短片段来表示知识以适应生成步骤,并利用来自多粒度视图的对应片段及其上下文信息来提高检索准确性。我们进一步为检索模型引入了一种自适应提示调优方法,以适应这种异构的检索增强生成过程。大量实验表明,HeteRAG相比基线方法取得了显著提升。