Memelang：一种用于LLM生成向量关系查询的轴向语法 (Memelang: An Axial Grammar for LLM-Generated Vector-Relational Queries)

Structured generation for LLM tool use highlights the value of compact DSL intermediate representations (IRs) that can be emitted directly and parsed deterministically. This paper introduces axial grammar: linear token sequences that recover multi-dimensional structure from the placement of rank-specific separator tokens. A single left-to-right pass assigns each token a coordinate in an n-dimensional grid, enabling deterministic parsing without parentheses or clause-heavy surface syntax. This grammar is instantiated in Memelang, a compact query language intended as an LLM-emittable IR whose fixed coordinate roles map directly to table/column/value slots. Memelang supports coordinate-stable relative references, parse-time variable binding, and implicit context carry-forward to reduce repetition in LLM-produced queries. It also encodes grouping, aggregation, and ordering via inline tags on value terms, allowing grouped execution plans to be derived in one streaming pass over the coordinate-indexed representation. Provided are a reference lexer/parser and a compiler that emits parameterized PostgreSQL SQL (optionally using pgvector operators).

翻译：面向LLM工具使用的结构化生成凸显了紧凑领域特定语言中间表示的价值，此类表示可直接生成并确定性解析。本文提出轴向语法：一种通过放置秩特定分隔符标记来恢复多维结构的线性标记序列。单次从左至右扫描即可为每个标记分配n维网格中的坐标，从而实现无需括号或复杂子句表层语法的确定性解析。该语法在Memelang中实例化，这是一种紧凑的查询语言，专为作为LLM可发射的中间表示而设计，其固定坐标角色直接映射到表/列/值槽位。Memelang支持坐标稳定的相对引用、解析时变量绑定以及隐式上下文传递机制，以减少LLM生成查询中的重复表达。它通过在值项上使用内联标签实现对分组、聚合和排序的编码，使得分组执行计划可在对坐标索引表示的单次流式遍历中推导得出。本文提供了参考词法分析器/解析器以及可生成参数化PostgreSQL SQL（可选使用pgvector运算符）的编译器实现。