Increasingly, attorneys are interested in moving beyond keyword and semantic search to improve the efficiency of how they find key information during a document review task. Large language models (LLMs) are now seen as tools that attorneys can use to ask natural language questions of their data during document review to receive accurate and concise answers. This study evaluates retrieval strategies within Microsoft Azure's Retrieval-Augmented Generation (RAG) framework to identify effective approaches for Early Case Assessment (ECA) in eDiscovery. During ECA, legal teams analyze data at the outset of a matter to gain a general understanding of the data and attempt to determine key facts and risks before beginning full-scale review. In this paper, we compare the performance of Azure AI Search's keyword, semantic, vector, hybrid, and hybrid-semantic retrieval methods. We then present the accuracy, relevance, and consistency of each method's AI-generated responses. Legal practitioners can use the results of this study to enhance how they select RAG configurations in the future.
翻译:律师们日益关注超越关键词和语义搜索,以提高在文档审阅任务中查找关键信息的效率。大型语言模型(LLMs)现被视为律师在文档审阅期间可使用的工具,能够以自然语言提问其数据并获得准确、简洁的答案。本研究评估了微软 Azure 检索增强生成(RAG)框架内的检索策略,旨在识别电子取证中早期案件评估(ECA)的有效方法。在 ECA 过程中,法律团队在案件初期分析数据,以获取对数据的总体理解,并尝试在开始全面审阅前确定关键事实与风险。本文比较了 Azure AI 搜索的关键词、语义、向量、混合及混合语义检索方法的性能。随后,我们展示了每种方法 AI 生成响应的准确性、相关性和一致性。法律从业者可利用本研究结果,优化未来 RAG 配置的选择方式。