Ensuring transparency of data practices related to personal information is a core requirement of the General Data Protection Regulation (GDPR). However, large-scale compliance assessment remains challenging due to the complexity and diversity of privacy policy language. Manual audits are labour-intensive and inconsistent, while current automated methods often lack the granularity required to capture nuanced transparency disclosures. In this paper, we present a modular large language model (LLM)-based pipeline for fine-grained word-level annotation of privacy policies with respect to GDPR transparency requirements. Our approach integrates LLM-driven annotation with passage-level classification, retrieval-augmented generation, and a self-correction mechanism to deliver scalable, context-aware annotations across 21 GDPR-derived transparency requirements. To support empirical evaluation, we compile a corpus of 703,791 English-language privacy policies and generate a ground-truth sample of 200 manually annotated policies based on a comprehensive, GDPR-aligned annotation scheme. We propose a two-tiered evaluation methodology capturing both passage-level classification and span-level annotation quality and conduct a comparative analysis of seven state-of-the-art LLMs on two annotation schemes, including the widely used OPP-115 dataset. The results of our evaluation show that decomposing the annotation task and integrating targeted retrieval and classification components significantly improve annotation accuracy, particularly for well-structured requirements. Our work provides new empirical resources and methodological foundations for advancing automated transparency compliance assessment at scale.
翻译:确保与个人信息相关的数据实践透明度是《通用数据保护条例》(GDPR)的核心要求。然而,由于隐私政策语言的复杂性和多样性,大规模合规性评估仍面临挑战。人工审计既劳动密集又缺乏一致性,而现有的自动化方法往往缺乏捕捉细微透明度披露所需的粒度。本文提出了一种基于模块化大型语言模型(LLM)的流程,用于针对GDPR透明度要求对隐私政策进行细粒度的词语级标注。我们的方法将LLM驱动的标注与段落级分类、检索增强生成以及自我修正机制相结合,以在21项GDPR衍生的透明度要求上提供可扩展、上下文感知的标注。为支持实证评估,我们构建了一个包含703,791份英文隐私政策的语料库,并基于一个全面且与GDPR对齐的标注方案,生成了200份手动标注政策的真实样本。我们提出了一种双层评估方法,涵盖段落级分类和跨度级标注质量,并对七种先进LLM在两种标注方案(包括广泛使用的OPP-115数据集)上进行了比较分析。评估结果表明,分解标注任务并整合针对性检索和分类组件能显著提高标注准确性,尤其对于结构良好的要求。我们的工作为推进大规模自动化透明度合规性评估提供了新的实证资源和方法论基础。