Advanced Persistent Threats (APTs) are stealthy cyberattacks that often evade detection in system-level audit logs. Provenance graphs model these logs as connected entities and events, revealing relationships that are missed by linear log representations. Existing systems apply anomaly detection to these graphs but often suffer from high false positive rates and coarse-grained alerts. Their reliance on node attributes like file paths or IPs leads to spurious correlations, reducing detection robustness and reliability. To fully understand an attack's progression and impact, security analysts need systems that can generate accurate, human-like narratives of the entire attack. To address these challenges, we introduce OCR-APT, a system for APT detection and reconstruction of human-like attack stories. OCR-APT uses Graph Neural Networks (GNNs) for subgraph anomaly detection, learning behavior patterns around nodes rather than fragile attributes such as file paths or IPs. This approach leads to a more robust anomaly detection. It then iterates over detected subgraphs using Large Language Models (LLMs) to reconstruct multi-stage attack stories. Each stage is validated before proceeding, reducing hallucinations and ensuring an interpretable final report. Our evaluations on the DARPA TC3, OpTC, and NODLINK datasets show that OCR-APT outperforms state-of-the-art systems in both detection accuracy and alert interpretability. Moreover, OCR-APT reconstructs human-like reports that comprehensively capture the attack story.
翻译:高级持续性威胁(APT)是一种隐蔽的网络攻击,通常能够在系统级审计日志中规避检测。溯源图将这些日志建模为相互关联的实体与事件,揭示了线性日志表示所遗漏的关系。现有系统虽对这些图应用异常检测,但常面临高误报率与粗粒度告警的问题。它们对文件路径或IP等节点属性的依赖易导致虚假关联,降低了检测的鲁棒性与可靠性。为完整理解攻击的演进过程与影响,安全分析师需要能够生成精确、类人化完整攻击叙述的系统。为应对这些挑战,我们提出了OCR-APT系统,用于APT检测及类人化攻击故事重构。OCR-APT采用图神经网络(GNN)进行子图异常检测,通过学习节点周围的行为模式而非依赖文件路径或IP等脆弱属性,实现了更鲁棒的异常检测。随后,系统基于检测到的子图迭代调用大型语言模型(LLM)重构多阶段攻击故事。每个阶段在推进前均经过验证,从而减少幻觉生成并确保最终报告的可解释性。我们在DARPA TC3、OpTC和NODLINK数据集上的评估表明,OCR-APT在检测准确率与告警可解释性方面均优于现有先进系统。此外,OCR-APT重构的类人化报告能够全面捕捉攻击故事脉络。