面向可读性与自然语法的黑盒上下文无关文法推断 (Black-box Context-free Grammar Inference for Readable & Natural Grammars)

Black-box context-free grammar inference is crucial for program analysis, reverse engineering, and security, yet existing tools such as Arvada, TreeVada, and Kedavra struggle with scalability, readability, and accuracy on large, complex languages. We present NatGI, a novel LLM-guided grammar inference framework that extends TreeVada's parse tree recovery with three key innovations: bracket-guided bubble exploration, LLM-driven bubble generation and non-terminal labeling, and hierarchical delta debugging (HDD) for systematic tree simplification. Bracket-guided exploration leverages syntactic cues such as parentheses to propose well-structured grammar fragments, while LLM guidance produces meaningful non-terminal names and selects more promising merges. Finally, HDD incrementally reduces unnecessary rules, which makes the grammars both compact and interpretable. In our experiments, we evaluate NatGI on a comprehensive benchmark suite ranging from small languages to larger ones such as lua, c, and mysql. Our results show that NatGI consistently outperforms strong baselines in terms of F1 score. On average, NatGI achieves an F1 score of 0.57, which is 25pp (percentage points) higher than the best-performing baseline, TreeVada. In the case of interpretability, our generated grammars perform significantly better than those produced by existing approaches. Leveraging LLM-based node renaming and bubble exploration, NatGI produces rules with meaningful non-terminal names and compact structures that align more closely with human intuition. As a result, developers and researchers can achieve higher accuracy while still being able to easily inspect, verify, and reason about the structure and semantics of the induced grammars.

翻译：黑盒上下文无关文法推断对于程序分析、逆向工程和安全领域至关重要，然而现有工具如Arvada、TreeVada和Kedavra在处理大规模复杂语言时，在可扩展性、可读性和准确性方面存在不足。我们提出NatGI，一种新颖的基于大语言模型（LLM）引导的文法推断框架，该框架通过三项关键创新扩展了TreeVada的解析树恢复能力：括号引导的语法片段探索、LLM驱动的语法片段生成与非终结符命名，以及用于系统化树简化的分层增量调试（HDD）。括号引导的探索利用括号等句法线索来提出结构良好的文法片段，而LLM引导则生成有意义的非终结符名称并选择更具潜力的合并操作。最后，HDD逐步减少不必要的规则，使得生成的文法既紧凑又可解释。在我们的实验中，我们在一个涵盖从小型语言到较大型语言（如lua、c和mysql）的综合基准测试套件上评估了NatGI。结果表明，在F1分数方面，NatGI始终优于现有强基线方法。平均而言，NatGI实现了0.57的F1分数，比表现最佳的基线方法TreeVada高出25个百分点。在可解释性方面，我们生成的文法显著优于现有方法生成的文法。通过利用基于LLM的节点重命名和语法片段探索，NatGI生成的规则具有有意义的非终结符名称和紧凑的结构，更贴近人类直觉。因此，开发者和研究人员能够在获得更高准确性的同时，仍能轻松检查、验证和推理所推导文法的结构与语义。